digitalmars.D - What's the go with the GC these days?

Manu (39/39) Jan 05 2019 I'm somewhere between a light GC user and a @nogc user, and I don't

Meta (3/7) Jan 05 2019 As per Andrei's talk at the last Dconf, ref counting requires

Nicholas Wilson (3/11) Jan 05 2019 I'd rather have opHeadMutable than __mutable, does the same thing

Meta (6/18) Jan 05 2019 I'm fairly dubious of adding __mutable as well, but I'm assuming

Timon Gehr (9/29) Jan 07 2019 It can't work without `__mutable` because it breaks transitivity of

Timon Gehr (2/15) Jan 07 2019 I'd like opHeadMutable too, but it certainly doesn't do the same thing.

Neia Neutuladh (44/58) Jan 05 2019 I use the GC unabashedly and only try to make sure I reuse memory when

Walter Bright (7/9) Jan 05 2019 Hmm, how does the Boehm collector insert write barriers?

Neia Neutuladh (12/25) Jan 05 2019 The docs say it "provides incremental and generational collection under

Walter Bright (9/16) Jan 05 2019 Microsoft tried this with their "Managed C++" variant of C++, where they...

Manu (23/34) Jan 05 2019 I was very much 'there' when this was rejected by the community. I

Walter Bright (17/43) Jan 06 2019 I had a lot of experience with multiple pointer types from the DOS world...

Manu (13/22) Jan 06 2019 But now we have heaps of pointers... just that they have ugly names.

Ethan (8/18) Jan 06 2019 Not much to add to this discussion other than "Where's my ARC???"
Walter Bright (7/12) Jan 06 2019 They can produce a working ref counting solution.

H. S. Teoh (13/31) Jan 06 2019 Just out of curiosity, any concrete examples of difficulties that

Walter Bright (11/32) Jan 06 2019 See:
Nicholas Wilson (10/13) Jan 06 2019 For a long while, documentation. See that dates on
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (10/12) Jan 08 2019 Let's say you obtain ownership of every other object in an array,

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (6/9) Jan 08 2019 Or rather, that every other object has been acquired (locked to

welkam (11/19) Jan 07 2019 Dont think of it as all or nothing. With dip1000 compiler knows

Neia Neutuladh (11/13) Jan 06 2019 Write barriers are how the GC maintains the graph of which memory points...

Russel Winder (18/21) Jan 06 2019 [=E2=80=A6]

Walter Bright (5/10) Jan 06 2019 1. Java has a very constrained interface to C, with a lot of rules to fo...

Russel Winder (24/30) Jan 07 2019 e,=20
H. S. Teoh (33/53) Jan 07 2019 IOW it's more of a matter of perception than an actual technical

H. S. Teoh (77/106) Jan 05 2019 I also use the GC freely, and only bother with GC optimization when my

Nicholas Wilson (12/42) Jan 05 2019 The recent shared thread by Manu was trying (only somewhat
Walter Bright (4/7) Jan 05 2019 They do. I forgot the jargon term for it, but they'll determine if the l...

Nicholas Wilson (3/12) Jan 05 2019 Heap to stack promotion? BTW LDC does this

Walter Bright (4/6) Jan 06 2019 I suppose that's as good a word as any!

Nicholas Wilson (5/11) Jan 06 2019 SROA is at least intuitive in what is does and makes sense when

Walter Bright (6/10) Jan 06 2019 SFINAE => Substitution Failure Is Not An Error

Steven Schveighoffer (19/27) Jan 07 2019 Unaligned pointers are not scanned, and so this is not a problem. As far...

Neia Neutuladh (14/41) Jan 07 2019 In that case, it's already solved! And in pretty much the same way as

H. S. Teoh (8/18) Jan 07 2019 Casting between arrays of different types does already call a druntime
Steven Schveighoffer (17/67) Jan 07 2019 Yes, well said. It's more precise, and a fully precise GC is not

H. S. Teoh (10/23) Jan 07 2019 [...]

Steven Schveighoffer (8/29) Jan 07 2019 Imagine I have a shared piece of data, then I cast it to thread local.

H. S. Teoh (9/22) Jan 07 2019 [...]

Neia Neutuladh (13/16) Jan 07 2019 class Node

Steven Schveighoffer (8/26) Jan 08 2019 You copy the memory, then update the references to that memory.

Neia Neutuladh (11/17) Jan 08 2019 Again, in the case of unions and void[], how do you determine if somethi...

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (2/5) Jan 08 2019 Pointer storage isn't limited to shared memory?

Neia Neutuladh (16/22) Jan 08 2019 1. Stop all the threads.

Neia Neutuladh (28/32) Jan 07 2019 She, not he.

Jonathan M Davis (28/60) Jan 07 2019 Casting between shared and thread-local doesn't have to be free, but it

Neia Neutuladh (21/24) Jan 07 2019 This is exactly the situation I brought up in the post you're replying t...

H. S. Teoh (32/49) Jan 07 2019 This step is too easy to get wrong. Once you cast shared away, the type
Jonathan M Davis (23/47) Jan 07 2019 Well, then I clearly read over it way too quickly.

Paulo Pinto (18/78) Jan 08 2019 Yet, all the GC enabled languages happened to beat Swift with its

=?UTF-8?B?TcOhcmNpbw==?= Martins (7/9) Jan 08 2019 I share the same sentiment.

Reimer Behrends (91/93) Jan 08 2019 I'd say it is. I've written up a small benchmark here:

Francesco Mecca (48/57) Jan 08 2019 I don't want to talk about my work (mentored by Leandro of

Sebastiaan Koppe (3/13) Jan 09 2019 That is just plain awesome. Good to see work is being done on

Walter Bright (17/21) Jan 05 2019 There's a lot of triggering on the word "GC". At some level, it doesn't ...

Manu (13/34) Jan 05 2019 Hmnm, that's pretty disappointing... as a C++ user who has been doing

Walter Bright (2/14) Jan 06 2019 So far, we've failed at every attempt at it.

Manu (5/19) Jan 06 2019 Why did opInc/opDec fail? It was never available in any experimental

Walter Bright (8/11) Jan 06 2019 There was a looong thread about it. It wasn't implemented before major p...

Russel Winder (16/31) Jan 07 2019 This is not an excuse to stop trying!
H. S. Teoh (17/35) Jan 07 2019 [...]

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (6/11) Jan 08 2019 I believe C++ compilers are allowed to do this for shared_ptr,

Jacob Carlborg (7/15) Jan 07 2019 Razvan has confirmed that copy constructors are only for structs [1].

H. S. Teoh (19/30) Jan 07 2019 [...]

Jacob Carlborg (5/9) Jan 08 2019 That will be quite ugly. Will these struct wrappers be covariant if the

bioinfornatics (16/43) Jan 08 2019 Topics: get a better understanding of D's application bottlenecks

Walter Bright (2/2) Jan 05 2019 Some more on write barriers with LLVM:
Neia Neutuladh (19/49) Jan 05 2019 String allocation is one area where D makes it absurdly easy to reuse

H. S. Teoh (13/51) Jan 05 2019 Yeah, Walter's right on the money about copy-on-substring being a big

Neia Neutuladh (7/11) Jan 05 2019 Interestingly, Java used to use slicing for substrings. This was changed...

Walter Bright (4/8) Jan 06 2019 That's why DMD doesn't use slicing of the source file buffer for strings...

Walter Bright (4/6) Jan 06 2019 Eh, I invented a technique in the 90's to do that. Then I read a paper a...

Guillaume Piolat (10/11) Jan 06 2019 The truth of oft-repeated mantras?
Reimer Behrends (181/185) Jan 06 2019 First of all, stop-the-world GC gets an unnecessarily bad rap

Walter Bright (5/8) Jan 06 2019 I use Thunderbird mail, latest version, and it has random pauses that ar...

H. S. Teoh (12/22) Jan 06 2019 I've experienced a similar effect in Firefox, and though I cannot say

Patrick Schluter (4/17) Jan 06 2019 I have that too. It happens always on javascript heavy pages like
Reimer Behrends (9/19) Jan 07 2019 Firefox/Thunderbird are written mostly in C/C++/Rust.
welkam (4/15) Jan 07 2019 this sound a lot like memory fragmentation

Russel Winder (19/42) Jan 07 2019 Some plugins to JetBrains CLion, IntelliJ IDEA, GoLand, and PyCharm (the...
Jacob Carlborg (6/12) Jan 07 2019 For me it only happens when downloading new emails. Which suggests
H. S. Teoh (12/15) Jan 07 2019 [...]

Radu (14/65) Jan 07 2019 There is this article [1] which covers some of the details and
welkam (25/33) Jan 07 2019 The whole r/programming needs to be garbage collected. I cant

JN (13/19) Jan 07 2019 I think for many people (especially C++ folks which seem to be

welkam (6/26) Jan 07 2019 Your sampling is bad. You pay attention to those who commented
H. S. Teoh (53/56) Jan 07 2019 Yep. GC phobia is a common malady among C/C++ folks (I used to be one of

welkam (6/8) Jan 07 2019 Most code is slow because programmer didn't had time to optimize.
Guillaume Piolat (7/9) Jan 07 2019 Guilty as charge here!

H. S. Teoh (18/28) Jan 07 2019 I wonder if some of us ex-GC-phobists(?) should throw together a D blog
Russel Winder (11/18) Jan 07 2019 Or create an article for Overload or CVu which can then be a blog post o...

Manu <turkeyman gmail.com> writes:

I'm somewhere between a light GC user and a nogc user, and I don't
really know much about where we're at, or much about start-of-the-art
GC in general.

I read comments like this:
https://www.reddit.com/r/programming/comments/acsg61/version_20840_of_dmd_the_d_reference_compiler_has/edbhgzi

"""
And so we do nothing and D continues to languish as every thread is
filled with discussion about a poor implementation of a sorely
outdated GC? Other languages in this domain get by just fine - Go and
Nim to name some off the top of my head (with the full disclosure that
the former isn't quite D performance and Nim uses a thread-local GC).
There are C / C++ libraries that provide a better GC. D's tradeoff is
basically having one of the worst GCs of any language I've used this
century to avoid a write barrier that causes a, what, 5%
nigh-deterministic overall reduction in performance (assuming we use
fast barriers out of a cycle).

And if a reasonably performant GC is fundamentally incompatible and
you think it will always be this way, then maybe D as a language needs
to make serious re-evaluations? D keeps saying "Don't Fear the
Reaper", but D is one of the only languages where I actually do.
"""

How much truth is in here?
What is this business about write barriers making the GC fast? Is that
actually fact, or are there other hurdles?

Is the claim that write-barriers generally slow your runtime
performance significant? And if it is, is it possible/reasonable to
control the places where they are emit?
Can nogc be used to inhibit write barriers in code that we know
doesn't interact with the GC, such that we have control over that loss
of perf?

It's obvious that the quality of the GC implementation is a mental
barrier to entry for many... and D has a GC, which is *attractive* to
some users. Despite the fact that I don't care about the GC much
personally, we do need to care about this as a group, and nobody seems
to be making substantial progress.

Is progress possible, or is the hard reality that the language is just
designed such to be resistant to a quality GC, while the ecosystem
sadly tends to rely on it?

Where's the ARC stuff? What happened to opAddRef/opRelease?

Jan 05 2019

Meta <jared771 gmail.com> writes:

On Saturday, 5 January 2019 at 22:05:19 UTC, Manu wrote:
 Is progress possible, or is the hard reality that the language 
 is just designed such to be resistant to a quality GC, while 
 the ecosystem sadly tends to rely on it?

 Where's the ARC stuff? What happened to opAddRef/opRelease?

As per Andrei's talk at the last Dconf, ref counting requires 
__mutable to play nicely with const and immutable.

Jan 05 2019

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Saturday, 5 January 2019 at 22:42:11 UTC, Meta wrote:
 On Saturday, 5 January 2019 at 22:05:19 UTC, Manu wrote:
 Is progress possible, or is the hard reality that the language 
 is just designed such to be resistant to a quality GC, while 
 the ecosystem sadly tends to rely on it?

 Where's the ARC stuff? What happened to opAddRef/opRelease?

 As per Andrei's talk at the last Dconf, ref counting requires 
 __mutable to play nicely with const and immutable.

I'd rather have opHeadMutable than __mutable, does the same thing 
but doesn't subvert  the type system

Jan 05 2019

Meta <jared771 gmail.com> writes:

On Saturday, 5 January 2019 at 23:51:53 UTC, Nicholas Wilson 
wrote:
 On Saturday, 5 January 2019 at 22:42:11 UTC, Meta wrote:
 On Saturday, 5 January 2019 at 22:05:19 UTC, Manu wrote:
 Is progress possible, or is the hard reality that the 
 language is just designed such to be resistant to a quality 
 GC, while the ecosystem sadly tends to rely on it?

 Where's the ARC stuff? What happened to opAddRef/opRelease?

 As per Andrei's talk at the last Dconf, ref counting requires 
 __mutable to play nicely with const and immutable.

 I'd rather have opHeadMutable than __mutable, does the same 
 thing but doesn't subvert  the type system

I'm fairly dubious of adding __mutable as well, but I'm assuming 
the previous solution of using an AfixAllocator didn't pan out. I 
don't know enough about opHeadMutable to consider whether it 
would address the same problems.

Jan 05 2019

Timon Gehr <timon.gehr gmx.ch> writes:

On 06.01.19 04:02, Meta wrote:
 On Saturday, 5 January 2019 at 23:51:53 UTC, Nicholas Wilson wrote:
 On Saturday, 5 January 2019 at 22:42:11 UTC, Meta wrote:
 On Saturday, 5 January 2019 at 22:05:19 UTC, Manu wrote:
 Is progress possible, or is the hard reality that the language is 
 just designed such to be resistant to a quality GC, while the 
 ecosystem sadly tends to rely on it?

 Where's the ARC stuff? What happened to opAddRef/opRelease?

 As per Andrei's talk at the last Dconf, ref counting requires 
 __mutable to play nicely with const and immutable.

 I'd rather have opHeadMutable than __mutable, does the same thing but 
 doesn't subvert  the type system

 
 I'm fairly dubious of adding __mutable as well, but I'm assuming the 
 previous solution of using an AfixAllocator didn't pan out.

It can't work without `__mutable` because it breaks transitivity of 
immutability. The entire point of `__mutable` is to add a transitivity 
escape hatch for ( system-level data structure and runtime 
implementations) that doesn't block high-level optimizations based on 
immutability and purity.


 I don't know 
 enough about opHeadMutable to consider whether it would address the same 
 problems.

It's orthogonal. If you have an immutable object, it can't have a 
reference to a reference count without `__mutable`. opHeadMutable can't 
change that.

Jan 07 2019

Timon Gehr <timon.gehr gmx.ch> writes:

On 06.01.19 01:51, Nicholas Wilson wrote:
 On Saturday, 5 January 2019 at 22:42:11 UTC, Meta wrote:
 On Saturday, 5 January 2019 at 22:05:19 UTC, Manu wrote:
 Is progress possible, or is the hard reality that the language is 
 just designed such to be resistant to a quality GC, while the 
 ecosystem sadly tends to rely on it?

 Where's the ARC stuff? What happened to opAddRef/opRelease?

 As per Andrei's talk at the last Dconf, ref counting requires 
 __mutable to play nicely with const and immutable.

 
 I'd rather have opHeadMutable than __mutable, does the same thing but 
 doesn't subvert the type system

I'd like opHeadMutable too, but it certainly doesn't do the same thing.

Jan 07 2019

Neia Neutuladh <neia ikeran.org> writes:

On Sat, 05 Jan 2019 14:05:19 -0800, Manu wrote:
 I'm somewhere between a light GC user and a  nogc user, and I don't
 really know much about where we're at, or much about start-of-the-art GC
 in general.

I use the GC unabashedly and only try to make sure I reuse memory when 
it's reasonably convenient. I've also looked into GC a bit.

 How much truth is in here?

D uses a simple GC. Simple means easy to reason about. It's also got 
better tools to stress the GC less. But one thing it could get that would 
be interesting is a thread-local GC. (You'd have a stop-the-world phase 
that happened infrequently. Casting something to shared would pin it in 
the thread it was allocated from.) Another thing that it probably should 
have is a precise pointer map. For small allocations, that probably 
wouldn't be a net benefit -- the overhead of storing a typeinfo pointer by 
each allocation isn't awesome. But for mid-size arrays and larger, it could 
be helpful.

The Boehm collector is more advanced than D's GC. It's (at least 
optionally) generational with write barriers.

Go is a GC-heavy language, only a bit less so than Java, so its GC 
performance makes a huge difference.

 What is this business about write barriers making the GC fast? Is that
 actually fact, or are there other hurdles?

With D's GC, your normal operation is unimpeded; you're not running GC 
code at all unless you allocate. This makes it easy to reason about the 
GC. However, the GC must then scan all your memory to determine if you 
have pointers to a memory allocation.

With write barriers, any write to any piece of memory that might contain 
pointers can be intercepted. The fast-but-inaccurate way is to use the 
MMU: mark that memory readonly and set up a fault handler. The handler 
will mark the memory read-write and enqueue that page of memory for 
scanning. The GC must also maintain a graph of which pages of memory point 
to which other pages of memory.

 Is the claim that write-barriers generally slow your runtime performance
 significant? And if it is, is it possible/reasonable to control the
 places where they are emit?

You can control it a little. Any time you think you might possibly need to 
write to a pointer, you do a fake write to it in advance. An optimizing 
compiler might try to remove those writes, so maybe define it in assembly.

This is hideously cumbersome.

Alternatively, you could manually mark a segment of memory for the GC to 
clear the write barrier, and it will have to scan that region next 
collection.

 Can  nogc be used to inhibit write barriers in code that we know doesn't
 interact with the GC, such that we have control over that loss of perf?

The point of a write barrier is to record where pointers might have 
changed.  nogc doesn't mean that you can't change a pointer, so it can't 
sidestep write barriers. Not unless you want both memory leaks and use-
after-free bugs.

 Is progress possible, or is the hard reality that the language is just
 designed such to be resistant to a quality GC, while the ecosystem sadly
 tends to rely on it?

Three things about D make it harder to make a good GC for it:
* unaligned pointers
* unions
* externally allocated memory (malloc and friends)

We've pretty much addressed malloc by telling people to manually add and 
remove malloced memory from what the GC scans. A union is pretty much just 
a pointer that might not be valid. Unaligned pointers just kind of suck.

Jan 05 2019

Walter Bright <newshound2 digitalmars.com> writes:

On 1/5/2019 3:12 PM, Neia Neutuladh wrote:
 The Boehm collector is more advanced than D's GC. It's (at least
 optionally) generational with write barriers.

Hmm, how does the Boehm collector insert write barriers?

Back in the 90's, I implemented a collector that did write barriers by setting 
the page to read-only. When a write to it was executed, the resulting seg fault 
was captured, the write was logged, and the page was set to read-write.

It was a fabulous idea, but the problem was it was SLOWER! The operating 
system's dealing with seg faults and changing write permissions was execrably
slow.

Jan 05 2019

Neia Neutuladh <neia ikeran.org> writes:

On Sat, 05 Jan 2019 19:46:22 -0800, Walter Bright wrote:
 On 1/5/2019 3:12 PM, Neia Neutuladh wrote:
 The Boehm collector is more advanced than D's GC. It's (at least
 optionally) generational with write barriers.

 
 Hmm, how does the Boehm collector insert write barriers?

The docs say it "provides incremental and generational collection under 
operating systems which provide the right kind of virtual memory support." 
The source code says they're using mprotect on most posix systems, 
vm_protect on OSX, and something on Windows.

The only other way I know of to implement a write barrier is to insert 
some code on every pointer write. GC-as-a-library would require you to use 
their pointer type instead of the builtin, and I expect most C++ devs 
would rather just use a reference counting pointer type instead.

 Back in the 90's, I implemented a collector that did write barriers by
 setting the page to read-only. When a write to it was executed, the
 resulting seg fault was captured, the write was logged, and the page was
 set to read-write.
 
 It was a fabulous idea, but the problem was it was SLOWER! The operating
 system's dealing with seg faults and changing write permissions was
 execrably slow.

Either the speed has improved, or they're just eating the time cost.

The other side effect is that you (and the GC) have to be very careful 
about replacing other segfault handlers.

Jan 05 2019

Walter Bright <newshound2 digitalmars.com> writes:

On 1/5/2019 8:52 PM, Neia Neutuladh wrote:
 The only other way I know of to implement a write barrier is to insert
 some code on every pointer write. GC-as-a-library would require you to use
 their pointer type instead of the builtin, and I expect most C++ devs
 would rather just use a reference counting pointer type instead.

Microsoft tried this with their "Managed C++" variant of C++, where they had
two 
fundamental pointer types. It was a laudable effort, but failed to gain 
traction. D learns from that mistake :-)


 Either the speed has improved, or they're just eating the time cost.

The fact that Java/Go/etc. use inserted write gates suggest the speed hasn't 
improved, which leaves eating the cost.


 The other side effect is that you (and the GC) have to be very careful
 about replacing other segfault handlers.

Yeah, you can do that with a sandboxed language, but not one that is a systems 
programming language where users will want to use to muck about with segfault 
handlers themselves.

Jan 05 2019

Manu <turkeyman gmail.com> writes:

On Sat, Jan 5, 2019 at 10:55 PM Walter Bright via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On 1/5/2019 8:52 PM, Neia Neutuladh wrote:
 The only other way I know of to implement a write barrier is to insert
 some code on every pointer write. GC-as-a-library would require you to use
 their pointer type instead of the builtin, and I expect most C++ devs
 would rather just use a reference counting pointer type instead.

 Microsoft tried this with their "Managed C++" variant of C++, where they had
two
 fundamental pointer types. It was a laudable effort, but failed to gain
 traction. D learns from that mistake :-)

I was very much 'there' when this was rejected by the community. I
don't think it 'failed to gain traction' despite trying, so much as it
was just plain rejected or dismissed.
And not for any particularly good reason. It went like this among the
~hundreds of programmers I interacted with at the time.


with the word 'managed'

4. The feature didn't stand alone as a C++ feature
5. Even if it did, it was limited to windows, and to MS compilers.
Completely not-portable.

It's obvious why it failed. I don't think comment on the merit of the
design has any relevance at all to that particular story.
The particulars of the design and whether it was good or not couldn't
have had less affect on the feature being ignored and/or rejected.

 Either the speed has improved, or they're just eating the time cost.

 The fact that Java/Go/etc. use inserted write gates suggest the speed hasn't
 improved, which leaves eating the cost.

Could D insert such code, and also effectively elide it where  nogc is
used (or inferred)?
If we can help the GC in common code, but still have the expressive
power we need to elide that extra work where we know it's not
necessary, then it might be reasonable to consider.

Jan 05 2019

Walter Bright <newshound2 digitalmars.com> writes:

On 1/5/2019 11:58 PM, Manu wrote:
 I was very much 'there' when this was rejected by the community. I
 don't think it 'failed to gain traction' despite trying, so much as it
 was just plain rejected or dismissed.
 And not for any particularly good reason. It went like this among the
 ~hundreds of programmers I interacted with at the time.


 with the word 'managed'

 4. The feature didn't stand alone as a C++ feature
 5. Even if it did, it was limited to windows, and to MS compilers.
 Completely not-portable.

I had a lot of experience with multiple pointer types from the DOS world. While 
it works, it is a constant headache. With every .. single .. pointer one had to 
decide near*? far*? ss*? cs*? huge*? and Zortech's vptr*? and of course every
.. 
single .. data .. structure had the same issues, and no, they could not 
interoperate.

Programmers (myself included) were very happy to abandon that way of doing
things.


 It's obvious why it failed. I don't think comment on the merit of the
 design has any relevance at all to that particular story.
 The particulars of the design and whether it was good or not couldn't
 have had less affect on the feature being ignored and/or rejected.

It had a large effect in my experience. I'm an expert on near/far/etc, and I 
don't miss it like I don't miss EBCDIC. I was kinda glad to see Managed C++ 
fail, so I wouldn't get any pressure to implement it.

(I also recall programmers looking at it, and respond "two kinds of pointers? 
Blech." Not just me.)


 Either the speed has improved, or they're just eating the time cost.

 The fact that Java/Go/etc. use inserted write gates suggest the speed hasn't
 improved, which leaves eating the cost.

 
 Could D insert such code, and also effectively elide it where  nogc is
 used (or inferred)?

 nogc code means no allocations happen, not that there are no pointers to the
gc 
in the code. It's orthogonal.


 If we can help the GC in common code, but still have the expressive
 power we need to elide that extra work where we know it's not
 necessary, then it might be reasonable to consider.

If we could do that, we wouldn't need GC, the compiler could figure out the
last 
use of each pointer. With all the research into GCs, nobody has ever figured 
that out.

Jan 06 2019

Manu <turkeyman gmail.com> writes:

On Sun, Jan 6, 2019 at 12:55 AM Walter Bright via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 It's obvious why it failed. I don't think comment on the merit of the
 design has any relevance at all to that particular story.
 The particulars of the design and whether it was good or not couldn't
 have had less affect on the feature being ignored and/or rejected.

 It had a large effect in my experience. I'm an expert on near/far/etc, and I
 don't miss it like I don't miss EBCDIC. I was kinda glad to see Managed C++
 fail, so I wouldn't get any pressure to implement it.

 (I also recall programmers looking at it, and respond "two kinds of pointers?
 Blech." Not just me.)

But now we have heaps of pointers... just that they have ugly names.
Now we have std::unique_ptr<T>, std::shared_ptr<T>, std::auto_ptr<T>,
std::weak_ptr<T>
These are all just as equally 'multiple kinds of pointers' as `T^` was
for ARC pointers, except they have hideous names, instead of a nice
concise 1-byte syntax.

It's all irrelevant though, Nobody's asking for multiple pointer types
in D. All I want from language support for ARC in D, is an opInc/opDec
function which are called appropriately around assignments, and elided
appropriately.
copy ctor's can't offer this functionality.

Jan 06 2019

Ethan <gooberman gmail.com> writes:

On Sunday, 6 January 2019 at 09:40:25 UTC, Manu wrote:

 But now we have heaps of pointers... just that they have ugly 
 names.
 Now we have std::unique_ptr<T>, std::shared_ptr<T>, 
 std::auto_ptr<T>,
 std::weak_ptr<T>
 These are all just as equally 'multiple kinds of pointers' as 
 `T^` was
 for ARC pointers, except they have hideous names, instead of a 
 nice
 concise 1-byte syntax.

Not much to add to this discussion other than "Where's my ARC???"

But, the impression I got from digging around UWP code for 
Quantum Break is that T^ is just syntactic sugar for a COM 
pointer. So C++ Windows programmers have had different types of 
pointers for much longer than C++11.

If Microsoft had just come out and said "They're COM pointers" 
there'd have been far less gnashing of teeth.

Jan 06 2019

Walter Bright <newshound2 digitalmars.com> writes:

On 1/6/2019 1:40 AM, Manu wrote:
 It's all irrelevant though, Nobody's asking for multiple pointer types
 in D. All I want from language support for ARC in D, is an opInc/opDec
 function which are called appropriately around assignments, and elided
 appropriately.

It turns out to be fiendishly difficult to automatically elide counter bumps.

 copy ctor's can't offer this functionality.

They can produce a working ref counting solution.

D's will have a couple fundamental advantages over the C++ one:

1. D won't need the locking on the increment, because D has different types for 
threaded vs non-threaded.

2. With dip25 and dip1000, D can offer memory safe access to rc object contents.

Jan 06 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Sun, Jan 06, 2019 at 01:25:32PM -0800, Walter Bright via Digitalmars-d wrote:
 On 1/6/2019 1:40 AM, Manu wrote:
 It's all irrelevant though, Nobody's asking for multiple pointer
 types in D. All I want from language support for ARC in D, is an
 opInc/opDec function which are called appropriately around
 assignments, and elided appropriately.

 
 It turns out to be fiendishly difficult to automatically elide counter
 bumps.

Just out of curiosity, any concrete examples of difficulties that
prevent easy elision of counter bumps? Just so we have a better idea of
the challenges we're up against.  Are there any common cases that are
easy enough to implement, that might be "good enough", leaving more
tricky cases aside?


 copy ctor's can't offer this functionality.

 
 They can produce a working ref counting solution.

Care to elaborate?  Is it related to how current move + postblit
semantics make it impractical to maintain a consistent ref count?


 D's will have a couple fundamental advantages over the C++ one:
 
 1. D won't need the locking on the increment, because D has different
 types for threaded vs non-threaded.
 
 2. With dip25 and dip1000, D can offer memory safe access to rc object
 contents.

dip25 and dip1000 have been around for a long time now. What are the
remaining issues blocking their full implementation / deployment?


T

-- 
Give a man a fish, and he eats once. Teach a man to fish, and he will sit
forever.

Jan 06 2019

Walter Bright <newshound2 digitalmars.com> writes:

On 1/6/2019 6:00 PM, H. S. Teoh wrote:
On Sun, Jan 06, 2019 at 01:25:32PM -0800, Walter Bright via Digitalmars-d
wrote:
It turns out to be fiendishly difficult to automatically elide counter
bumps.

Just out of curiosity, any concrete examples of difficulties that
prevent easy elision of counter bumps? Just so we have a better idea of
the challenges we're up against. Are there any common cases that are
easy enough to implement, that might be "good enough", leaving more
tricky cases aside?

See:

https://digitalmars.com/d/archives/digitalmars/D/draft_proposal_for_ref_counting_in_D_211885.html

There's more on the Dlang-study mailing list:

Dlang-study puremagic.com
http://lists.puremagic.com/cgi-bin/mailman/listinfo/dlang-study

copy ctor's can't offer this functionality.

They can produce a working ref counting solution.

Care to elaborate?

https://github.com/RazvanN7/DIPs/blob/efacbf8efde8027291113633984b0e7a039e8f30/DIPs/DIP1xxx-rn.md

D's will have a couple fundamental advantages over the C++ one:

1. D won't need the locking on the increment, because D has different
types for threaded vs non-threaded.

2. With dip25 and dip1000, D can offer memory safe access to rc object
contents.

dip25 and dip1000 have been around for a long time now. What are the
remaining issues blocking their full implementation / deployment?

dip25 is becoming the default compiler behavior.

dip1000 needs getting Phobos to compile with dip1000. Now that
https://github.com/dlang/dmd/pull/8504 is merged, I still need to improve it
with `return` inference.

Jan 06 2019

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Monday, 7 January 2019 at 02:00:05 UTC, H. S. Teoh wrote:
 dip25 and dip1000 have been around for a long time now. What 
 are the remaining issues blocking their full implementation / 
 deployment?

For a long while, documentation. See that dates on 
https://github.com/dlang/dlang.org/pull/2453 a good waste of 4 
1/2 months. But the documentation still needs to improve a lot.

dip25 is in the process of being turned on by default as an 
opt-out -transition switch and going through a deprecation cycle. 
dip1000 has changed quite a bit since it since the reviews and I 
suspect it will need to go through more once the spec for to is 
nailed down more (hello dconf?) and it too will probably need to 
be transitioned in.

Jan 06 2019

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Monday, 7 January 2019 at 02:00:05 UTC, H. S. Teoh wrote:
 Just out of curiosity, any concrete examples of difficulties 
 that prevent easy elision of counter bumps?

Let's say you obtain ownership of every other object in an array, 
then you have to prove that  you also release every other object 
in that array before you return from the function.

If you always hold onto the ref-counted object in a named 
reference that is never changed, then it is reasonably easy to 
do. Then you only have to prove that the acquired object is 
released once while the reference is live.

So basically, whenever the recounted object is accessed through a 
graph-like structure then you need an advanced prover.

Jan 08 2019

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Tuesday, 8 January 2019 at 10:56:43 UTC, Ola Fosheim Grøstad 
wrote:
 Let's say you obtain ownership of every other object in an 
 array, then you have to prove that  you also release every 
 other object in that array before you return from the function.

Or rather, that every other object has been acquired (locked to 
memory) by the context (the calling function).

So, it is quite doable, but you have to use ideas used for 
proving program correctness, which is a difficult topic.

Jan 08 2019

welkam <wwwelkam gmail.com> writes:

On Sunday, 6 January 2019 at 08:52:53 UTC, Walter Bright wrote:
 On 1/5/2019 11:58 PM, Manu wrote:
 If we can help the GC in common code, but still have the 
 expressive
 power we need to elide that extra work where we know it's not
 necessary, then it might be reasonable to consider.

 If we could do that, we wouldn't need GC, the compiler could 
 figure out the last use of each pointer. With all the research 
 into GCs, nobody has ever figured that out.

Dont think of it as all or nothing. With dip1000 compiler knows 
the life time of some objects and if they have a reference that 
outlives objects scope. With this information compiler can 
replace some GC allocations with stack or specialized malloc 
allocations with deallocation at the end of the scope. This 
simple addition to compiler could increase performance of GC code 
by a lot in poorly written code.

With some mentoring I could even implement it myself. It doesnt 
sound that hard (famous last words) since lifetime tracking is 
already implemented

Jan 07 2019

Neia Neutuladh <neia ikeran.org> writes:

On Sat, 05 Jan 2019 23:58:55 -0800, Manu wrote:
 Could D insert such code, and also effectively elide it where  nogc is
 used (or inferred)?

Write barriers are how the GC maintains the graph of which memory points 
to which other memory, so it is very difficult to avoid them.

You don't need to include them for writes to non-pointers. For injected 
runtime calls, this means any non-pointer. For mprotect(), it means the 
entire block of memory must be pointer-free.

You could use  nogc as a means to prove that you don't have to call the 
"pointer was written" code multiple times for the same address. The 
compiler has to prove that it's the same address, that no other thread 
could have written to that data, and that the GC couldn't have run in 
between. But D as is can't prove any of that.

Jan 06 2019

Russel Winder <russel winder.org.uk> writes:

On Sat, 2019-01-05 at 22:54 -0800, Walter Bright via Digitalmars-d wrote:
[=E2=80=A6]
 The fact that Java/Go/etc. use inserted write gates suggest the speed
 hasn't=20
 improved, which leaves eating the cost.

[=E2=80=A6]

Java GC continues to change and improve in leaps and bounds, both speed and
latency. And indeed a lack of "stop the world time". The G1 GC that was see=
n
as the very best there could be two Java versions ago has been surpassed ag=
ain
with Shenandoah. JVM GC people just keep coming up with ways of killing off=
 GC
cost.

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk

Jan 06 2019

Walter Bright <newshound2 digitalmars.com> writes:

On 1/6/2019 3:32 AM, Russel Winder wrote:
 Java GC continues to change and improve in leaps and bounds, both speed and
 latency. And indeed a lack of "stop the world time". The G1 GC that was seen
 as the very best there could be two Java versions ago has been surpassed again
 with Shenandoah. JVM GC people just keep coming up with ways of killing off GC
 cost.

1. Java has a very constrained interface to C, with a lot of rules to follow, 
mainly so the GC is not corrupted. D, being a systems programming language, 
simply does not have that luxury.

2. Let me know when Java lets go of write barriers!

Jan 06 2019

Russel Winder <russel winder.org.uk> writes:

On Sun, 2019-01-06 at 13:28 -0800, Walter Bright via Digitalmars-d wrote:
=20

[=E2=80=A6]
 1. Java has a very constrained interface to C, with a lot of rules to
 follow,=20
 mainly so the GC is not corrupted. D, being a systems programming languag=

e,=20
 simply does not have that luxury.

I understand that the Java language and its heap are very different from D,=
 so
the context and constraints are different. I am assuming Go is the same,
though a priori I would think less different to D than Java. My concern (if=
 it
can be called that) is that the JVM folk and the Go folk spend a lot of tim=
e
finding new and/or improved GC algorithms, whereas in D there appears to be=
 no
movement from an old and simple GC.

 2. Let me know when Java lets go of write barriers!

G1 GC certainly avoids write barriers whenever it can. I do not know the
details, but there are conference papers out there looking at avoiding writ=
e
barriers at all costs in the newer GC algorithms.

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk

Jan 07 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Jan 07, 2019 at 06:59:10PM +0000, Russel Winder via Digitalmars-d wrote:
 On Sun, 2019-01-06 at 13:28 -0800, Walter Bright via Digitalmars-d wrote:
 

 […]
 1. Java has a very constrained interface to C, with a lot of rules
 to follow, mainly so the GC is not corrupted. D, being a systems
 programming language, simply does not have that luxury.

 
 I understand that the Java language and its heap are very different
 from D, so the context and constraints are different. I am assuming Go
 is the same, though a priori I would think less different to D than
 Java. My concern (if it can be called that) is that the JVM folk and
 the Go folk spend a lot of time finding new and/or improved GC
 algorithms, whereas in D there appears to be no movement from an old
 and simple GC.

IOW it's more of a matter of perception than an actual technical
issue then?  And just FYI, there *have* been gradual improvements to D's
GC over the years, even if they are not ground-breaking ones.

In any case, as Walter said, Java / Go / similar languages impose quite
a lot of restrictions on what the programmer can do, which gives the
language implementors many more assumptions they can work with.

D pretty much gives the programmer free rein, except perhaps under
 safe, but even then, the restrictions are pretty minimal and nowhere
near the managed languages.  This leaves the language implementors very
little room for maneuvre, meaning that any prospective GC algorithm for
D is constrained by being unable to make the same assumptions that a
Java or Go GC can take for granted -- in fact, unable to make many
assumptions at all.  Naturally, this also rules out a lot of algorithms
that could otherwise be applicable, and makes finding a significantly
improved algorithm a very challenging task.  (And that's assuming said
significantly improved algorithm exists at all, under D's operating
conditions.)


 2. Let me know when Java lets go of write barriers!

 
 G1 GC certainly avoids write barriers whenever it can.

The stickler is in "whenever it can".  Currently, Walter is operating
under the premise of "no write barriers *at all*".  Whether or not this
is a tenable stance as far as GC improvement is concerned remains to be
seen, but being able to fallback to write barriers occasionally
represents a completely different ballpark to having *zero* write
barriers, *anywhere*.  It may sound like a subtle difference, but it
makes a world of difference as far as GC algorithms are concerned.


 I do not know the details, but there are conference papers out there
 looking at avoiding write barriers at all costs in the newer GC
 algorithms.

[...]

If there are any GC algorithms that require no write barriers at all and
still offer the same benefits as a modern generational GC, I'm sure
we'll be jumping all over it in no time at all!  Assuming that it
doesn't make other assumptions incompatible with D, that is.


T

-- 
The best compiler is between your ears. -- Michael Abrash

Jan 07 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Sat, Jan 05, 2019 at 11:12:52PM +0000, Neia Neutuladh via Digitalmars-d
wrote:
 On Sat, 05 Jan 2019 14:05:19 -0800, Manu wrote:
 I'm somewhere between a light GC user and a  nogc user, and I don't
 really know much about where we're at, or much about
 start-of-the-art GC in general.

 
 I use the GC unabashedly and only try to make sure I reuse memory when
 it's reasonably convenient. I've also looked into GC a bit.

I also use the GC freely, and only bother with GC optimization when my
profiler shows that there's an actual problem.

As I've said numerous times before, unless you're working on extremely
time-sensitive code like real-time applications or 3D game engines, D's
GC usually does not cause much noticeable difference. Unless you do
something extreme like allocate tens of millions of small strings (or
other small objects) per second, or allocate huge objects rapidly and
expect unreferenced memory to be quickly reused.

And when the GC does start becoming a problem, there are often easy
first-stab solutions that offer improvements that suffice for most
cases, the most obvious one being calling GC.disable and then scheduling
GC.collect yourself at more convenient times than when the default
system might run collections. This requires the least code changes, and
IME often yields quite good improvements.

Past that, the next most obvious one is to reduce GC pressure by reusing
frequently-allocated objects -- standard advice for improving GC
performance, y'know.  This requires a bit more work, but should be
obvious by identifying code hotspots with a profiler and examining which
objects are being most frequently allocated. Use stdx.allocator to
allocate from a pool instead, or just retain the object in a cache and
reuse it the next time round, etc..

If the GC is still an issue after this, look into preallocating objects
outside your main loop, so that you can control exactly when GC pauses
happen.  Again, standard advice for GC optimization.

Past this point, if GC remains a big issue, you could start pulling out
 nogc and using malloc/free, etc.. (Though I might do malloc/free in the
previous step if I know there are big objects I'm gonna need, and they
have straightforward lifetimes that are easy to track -- this reduces
the size of the GC heap, and thereby also improves collection
performance.)

//

As far as improving the GC itself is concerned, it will surely be nice,
and an overall win for D, and certainly we shouldn't delay on doing
this. But I don't think it's a life-or-death problem that we must fix
Right Here And Now Or Else(tm).


 How much truth is in here?

 
 D uses a simple GC. Simple means easy to reason about. It's also got
 better tools to stress the GC less.

Yeah, I find that with my recent, more idiomatic D code, I tend to use
ranges with lazy evaluation a lot more than plain arrays, meaning that
what I'd normally allocate as arrays in equivalent C/C++ code, in D no
allocation (or significantly less allocation) actually happens because
of lazy evaluation.  The one place where I still tend to allocate more
would be string manipulation, but even here, using slices instead of
copying substrings like in C/C++ still reduces the actual number of
allocations.

The situation is significantly different from GC-heavy languages like
Java, where basically every non-trivial type requires an allocation with
few or no alternatives or ways of bypassing it. The absence of by-value
aggregates in Java causes your average code to allocate far more than
the equivalent D code, not to mention the heavy OO emphasis often leads
to many indirections like interfaces and vtables, which often also
entail allocating adaptor objects, wrappers, etc..  So in Java, GC
performance would play a much larger role in overall performance,
whereas in D, the prevalence of by-value types like ranges, and passing
small parcels of information as structs rather than classes makes the GC
pressure much lower, so GC performance is not as significant.

(Of course, it's possible for Java compilers to optimize away some
allocations if object lifetimes can be statically determined -- I don't
know if any Java compilers actually do this. Still, Java code does tend
to be more allocation-heavy than typical D code.)


 But one thing it could get that would be interesting is a thread-local
 GC.

Yes, yes, and yes!  This would allow per-thread segregation of the GC
heap, which would allow much better control of GC pauses.


 (You'd have a stop-the-world phase that happened infrequently.
 Casting something to shared would pin it in the thread it was
 allocated from.)

How does this solve the problem of shared, though?  The last time I
checked, casting to/from shared is the main showstopper for a
thread-local GC.


[...]
 Is progress possible, or is the hard reality that the language is
 just designed such to be resistant to a quality GC, while the
 ecosystem sadly tends to rely on it?

 
 Three things about D make it harder to make a good GC for it:
 * unaligned pointers
 * unions
 * externally allocated memory (malloc and friends)
 
 We've pretty much addressed malloc by telling people to manually add
 and remove malloced memory from what the GC scans. A union is pretty
 much just a pointer that might not be valid. Unaligned pointers just
 kind of suck.

Unaligned pointers are generally just a bad idea IMO.  I'm tempted to
say it should be defined as UB, along with obscured pointers (like the
doubly-linked list with 1 pointer per node trick using XOR). Maybe with
a compiler / runtime switch to enable a more conservative GC.

Now that I think of it, we could deal with pointers in unions the same
way -- if the compiler detects it, then trigger conservative mode in the
GC.

With these two out of the way, a generational GC for D seems closer to
the realm of possibility.


T

-- 
"How are you doing?" "Doing what?"

Jan 05 2019

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Sunday, 6 January 2019 at 04:34:30 UTC, H. S. Teoh wrote:
 On Sat, Jan 05, 2019 at 11:12:52PM +0000, Neia Neutuladh via
 But one thing it could get that would be interesting is a 
 thread-local GC.

 Yes, yes, and yes!  This would allow per-thread segregation of 
 the GC heap, which would allow much better control of GC pauses.


 (You'd have a stop-the-world phase that happened infrequently. 
 Casting something to shared would pin it in the thread it was 
 allocated from.)

 How does this solve the problem of shared, though?  The last 
 time I checked, casting to/from shared is the main showstopper 
 for a thread-local GC.

The recent shared thread by Manu was trying (only somewhat 
successfully) to tie shared to  safe/ trusted with an inductive 
model, i.e. the user writes  trusted foundational layer and 
 safety is verified from there on up the call stack. I wonder if 
that sort of a model would safely permit a thread local GC?

 Three things about D make it harder to make a good GC for it:
 * unaligned pointers
 * unions
 * externally allocated memory (malloc and friends)
 
 We've pretty much addressed malloc by telling people to 
 manually add and remove malloced memory from what the GC 
 scans. A union is pretty much just a pointer that might not be 
 valid. Unaligned pointers just kind of suck.

 Unaligned pointers are generally just a bad idea IMO.  I'm 
 tempted to say it should be defined as UB, along with obscured 
 pointers (like the doubly-linked list with 1 pointer per node 
 trick using XOR). Maybe with a compiler / runtime switch to 
 enable a more conservative GC.

 Now that I think of it, we could deal with pointers in unions 
 the same way -- if the compiler detects it, then trigger 
 conservative mode in the GC.

 With these two out of the way, a generational GC for D seems 
 closer to the realm of possibility.

With a precise GC that knows the types/layout of what its dealing 
with it ought to be possible to have a way to retrieve all the 
pointers from a type for scanning. For an inactive union that 
would be returning nothing. It could possibly work for an XORlist 
as well, not that I thunk we should be letting that kind of 
monstrosity dictate our designs.  Ditto for unaligned pointers.

Jan 05 2019

Walter Bright <newshound2 digitalmars.com> writes:

On 1/5/2019 8:34 PM, H. S. Teoh wrote:
 Of course, it's possible for Java compilers to optimize away some
 allocations if object lifetimes can be statically determined -- I don't
 know if any Java compilers actually do this.

They do. I forgot the jargon term for it, but they'll determine if the lifetime 
does not escape the function scope, and allocate it on the stack instead of the 
heap.

Jan 05 2019

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Sunday, 6 January 2019 at 06:57:40 UTC, Walter Bright wrote:
 On 1/5/2019 8:34 PM, H. S. Teoh wrote:
 Of course, it's possible for Java compilers to optimize away 
 some
 allocations if object lifetimes can be statically determined 
 -- I don't
 know if any Java compilers actually do this.

 They do. I forgot the jargon term for it, but they'll determine 
 if the lifetime does not escape the function scope, and 
 allocate it on the stack instead of the heap.

Heap to stack promotion? BTW LDC does this 
https://github.com/ldc-developers/ldc/blob/master/gen/passes/GarbageCollect2Stack.cpp

Jan 05 2019

Walter Bright <newshound2 digitalmars.com> writes:

On 1/5/2019 11:31 PM, Nicholas Wilson wrote:
 Heap to stack promotion? BTW LDC does this 
 https://github.com/ldc-developers/ldc/blob/master/gen/passes/Garb
geCollect2Stack.cpp 


I suppose that's as good a word as any!

I just remember SROA as a candidate for the most pretentious, obtuse compiler 
acronym ever, right behind SFINAE.

Jan 06 2019

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Sunday, 6 January 2019 at 08:55:31 UTC, Walter Bright wrote:
 On 1/5/2019 11:31 PM, Nicholas Wilson wrote:
 Heap to stack promotion? BTW LDC does this 
 https://github.com/ldc-developers/ldc/blob/master/gen/passes/GarbageCollect2Stack.cpp


 I suppose that's as good a word as any!

 I just remember SROA as a candidate for the most pretentious, 
 obtuse compiler acronym ever, right behind SFINAE.

SROA is at least intuitive in what is does and makes sense when 
you consider padding, SFINAE OTOH, I know what it stands for, I 
have no idea what it does or why. Mind you thats par for the 
course for C++.

Jan 06 2019

Walter Bright <newshound2 digitalmars.com> writes:

On 1/6/2019 7:05 PM, Nicholas Wilson wrote:
 SROA is at least intuitive in what is does and makes sense when you consider 
 padding,

I would have called something like Aggregate Slicing, or simply Slice-O-Matic
:-)

 SFINAE OTOH, I know what it stands for, I have no idea what it does or 
 why.

SFINAE => Substitution Failure Is Not An Error

It means if a function template instantiation does not produce compilable code, 
the function is (silently) not instantiated. In D we do this with template 
constraints.

Jan 06 2019

Steven Schveighoffer <schveiguy gmail.com> writes:

On 1/5/19 6:12 PM, Neia Neutuladh wrote:
 On Sat, 05 Jan 2019 14:05:19 -0800, Manu wrote:
 Is progress possible, or is the hard reality that the language is just
 designed such to be resistant to a quality GC, while the ecosystem sadly
 tends to rely on it?

 
 Three things about D make it harder to make a good GC for it:
 * unaligned pointers

Unaligned pointers are not scanned, and so this is not a problem. As far 
as the GC is concerned, they aren't pointers.

Perhaps you meant interior pointers?

 * unions

Unions and void * (or essentially untyped blocks) are the biggest 
problems. But depending on what we want to implement, we may have 
solutions for that.

It's difficult to paint with broad brushes here. There are certain 
problems that prevent certain solutions. But, for instance, a precise GC 
is getting traction as we speak: https://github.com/dlang/druntime/pull/2418

But precise GC is not necessarily better performing. And it's still 
"stop-the-world".

I think the biggest improvement for D's GC could be the thread-local 
pools (and avoid stop-the-world for those), but since casting shared and 
local data around is valid, I don't know how it can be implemented 
without language changes.

There are always extra heuristics and algorithmic improvements that 
people can analyze in the current GC as well.

-Steve

Jan 07 2019

Neia Neutuladh <neia ikeran.org> writes:

On Mon, 07 Jan 2019 10:21:58 -0500, Steven Schveighoffer wrote:
 On 1/5/19 6:12 PM, Neia Neutuladh wrote:
 On Sat, 05 Jan 2019 14:05:19 -0800, Manu wrote:
 Is progress possible, or is the hard reality that the language is just
 designed such to be resistant to a quality GC, while the ecosystem
 sadly tends to rely on it?

 
 Three things about D make it harder to make a good GC for it:
 * unaligned pointers

 
 Unaligned pointers are not scanned, and so this is not a problem. As far
 as the GC is concerned, they aren't pointers.

In that case, it's already solved! And in pretty much the same way as 
malloc'd memory.

(I was going to check but was overcome with a bout of laziness.)

 * unions

 
 Unions and void * (or essentially untyped blocks) are the biggest
 problems. But depending on what we want to implement, we may have
 solutions for that.
 
 It's difficult to paint with broad brushes here. There are certain
 problems that prevent certain solutions. But, for instance, a precise GC
 is getting traction as we speak:
 https://github.com/dlang/druntime/pull/2418

That's a more precise GC instead of a fully precise GC. Unions and void[] 
mean you can't have a fully precise GC.

 But precise GC is not necessarily better performing. And it's still
 "stop-the-world".

Right. Precise means you collect more data more often, and it means you 
use up cache lines for pointer maps. It might also fragment your heap more 
(each allocation needs a pointer map, or maybe you have different pools 
for sufficiently common things like all pointers or no pointers).

 I think the biggest improvement for D's GC could be the thread-local
 pools (and avoid stop-the-world for those), but since casting shared and
 local data around is valid, I don't know how it can be implemented
 without language changes.

Like I said before, casting to shared would have to invoke a runtime 
function. That's a runtime change and compiler change, but I don't think 
the spec says anything about whether any cast might invoke a runtime 
function.

Jan 07 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Jan 07, 2019 at 06:59:37PM +0000, Neia Neutuladh via Digitalmars-d
wrote:
 On Mon, 07 Jan 2019 10:21:58 -0500, Steven Schveighoffer wrote:

[...]
 I think the biggest improvement for D's GC could be the thread-local
 pools (and avoid stop-the-world for those), but since casting shared
 and local data around is valid, I don't know how it can be
 implemented without language changes.

 
 Like I said before, casting to shared would have to invoke a runtime
 function. That's a runtime change and compiler change, but I don't
 think the spec says anything about whether any cast might invoke a
 runtime function.

Casting between arrays of different types does already call a druntime
function in some cases (e.g., casting int[] to S[] where S is some
struct).  So I'd say it's fair game.


T

-- 
Only boring people get bored. -- JM

Jan 07 2019

Steven Schveighoffer <schveiguy gmail.com> writes:

On 1/7/19 1:59 PM, Neia Neutuladh wrote:
 On Mon, 07 Jan 2019 10:21:58 -0500, Steven Schveighoffer wrote:
 On 1/5/19 6:12 PM, Neia Neutuladh wrote:
 On Sat, 05 Jan 2019 14:05:19 -0800, Manu wrote:
 Is progress possible, or is the hard reality that the language is just
 designed such to be resistant to a quality GC, while the ecosystem
 sadly tends to rely on it?

 Three things about D make it harder to make a good GC for it:
 * unaligned pointers

 Unaligned pointers are not scanned, and so this is not a problem. As far
 as the GC is concerned, they aren't pointers.

 
 In that case, it's already solved! And in pretty much the same way as
 malloc'd memory.
 
 (I was going to check but was overcome with a bout of laziness.)
 
 * unions

 Unions and void * (or essentially untyped blocks) are the biggest
 problems. But depending on what we want to implement, we may have
 solutions for that.

 It's difficult to paint with broad brushes here. There are certain
 problems that prevent certain solutions. But, for instance, a precise GC
 is getting traction as we speak:
 https://github.com/dlang/druntime/pull/2418

 
 That's a more precise GC instead of a fully precise GC. Unions and void[]
 mean you can't have a fully precise GC.

Yes, well said. It's more precise, and a fully precise GC is not 
possible due to how D works.

 
 But precise GC is not necessarily better performing. And it's still
 "stop-the-world".

 
 Right. Precise means you collect more data more often, and it means you
 use up cache lines for pointer maps. It might also fragment your heap more
 (each allocation needs a pointer map, or maybe you have different pools
 for sufficiently common things like all pointers or no pointers).

The pointer maps are static, and so they are stored with typeinfo I 
believe. We already store a typeinfo pointer for everything but 
low-level allocations and basic types (for destruction).

There may be tweakable cases where it's OK to be imprecise to achieve 
better speed, or have a "small pointer map optimization".

 
 I think the biggest improvement for D's GC could be the thread-local
 pools (and avoid stop-the-world for those), but since casting shared and
 local data around is valid, I don't know how it can be implemented
 without language changes.

 
 Like I said before, casting to shared would have to invoke a runtime
 function. That's a runtime change and compiler change, but I don't think
 the spec says anything about whether any cast might invoke a runtime
 function.
 

This is doable. Casting to/from shared should be a rare thing, and 
probably fine to hook.

However, the problem is more (for me) that the memory would have to be 
copied, as the memory is currently in one pool and has to be moved to 
another pool.

It's better probably to just allocate it shared to begin with, or copy 
it if you need to "cast" to/from shared.

Some notion of tail-modified type constructors would be really useful here.

-Steve

Jan 07 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Jan 07, 2019 at 02:33:56PM -0500, Steven Schveighoffer via
Digitalmars-d wrote:
 On 1/7/19 1:59 PM, Neia Neutuladh wrote:

[...]
 Like I said before, casting to shared would have to invoke a runtime
 function. That's a runtime change and compiler change, but I don't
 think the spec says anything about whether any cast might invoke a
 runtime function.
 

 
 This is doable. Casting to/from shared should be a rare thing, and
 probably fine to hook.
 
 However, the problem is more (for me) that the memory would have to be
 copied, as the memory is currently in one pool and has to be moved to
 another pool.

[...]

I think what Neia has in mind, which he mentioned in another post, is to
have the druntime function simply pin the object, whichever pool it
belongs to.  The idea being to tell the respective thread-local GC "this
object might be referenced by another thread, do not collect or move".


T

-- 
"The number you have dialed is imaginary. Please rotate your phone 90 degrees
and try again."

Jan 07 2019

Steven Schveighoffer <schveiguy gmail.com> writes:

On 1/7/19 2:52 PM, H. S. Teoh wrote:
 On Mon, Jan 07, 2019 at 02:33:56PM -0500, Steven Schveighoffer via
Digitalmars-d wrote:
 On 1/7/19 1:59 PM, Neia Neutuladh wrote:

 [...]
 Like I said before, casting to shared would have to invoke a runtime
 function. That's a runtime change and compiler change, but I don't
 think the spec says anything about whether any cast might invoke a
 runtime function.

 This is doable. Casting to/from shared should be a rare thing, and
 probably fine to hook.

 However, the problem is more (for me) that the memory would have to be
 copied, as the memory is currently in one pool and has to be moved to
 another pool.

 [...]
 
 I think what Neia has in mind, which he mentioned in another post, is to
 have the druntime function simply pin the object, whichever pool it
 belongs to.  The idea being to tell the respective thread-local GC "this
 object might be referenced by another thread, do not collect or move".

Imagine I have a shared piece of data, then I cast it to thread local. 
Now, it is able to point at thread local data, but it lives in the 
shared heap. Pinned or not, this means a local collection must scan the 
shared heap to see if it can collect any thread-local data. This defeats 
the purpose of having a thread-local heap in the first place (which you 
should be able to scan without having to stop-the-world).

-Steve

Jan 07 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Jan 07, 2019 at 03:28:26PM -0500, Steven Schveighoffer via
Digitalmars-d wrote:
 On 1/7/19 2:52 PM, H. S. Teoh wrote:

[...]
 I think what Neia has in mind, which he mentioned in another post,
 is to have the druntime function simply pin the object, whichever
 pool it belongs to.  The idea being to tell the respective
 thread-local GC "this object might be referenced by another thread,
 do not collect or move".

 
 Imagine I have a shared piece of data, then I cast it to thread local.
 Now, it is able to point at thread local data, but it lives in the
 shared heap.  Pinned or not, this means a local collection must scan
 the shared heap to see if it can collect any thread-local data. This
 defeats the purpose of having a thread-local heap in the first place
 (which you should be able to scan without having to stop-the-world).

[...]

Hmph, good point.  That sux. :-(  I suppose that's where copying/moving
the object comes in -- migrate it to a different pool somehow so that we
can avoid stop-the-world.


T

-- 
They pretend to pay us, and we pretend to work. -- Russian saying

Jan 07 2019

Neia Neutuladh <neia ikeran.org> writes:

On Mon, 07 Jan 2019 13:06:22 -0800, H. S. Teoh wrote:
 Hmph, good point.  That sux. :-(  I suppose that's where copying/moving
 the object comes in -- migrate it to a different pool somehow so that we
 can avoid stop-the-world.

class Node
{
  enum Type { leaf, intermediate }
  Type type;
  union { Node child; ulong data; }
}
auto node = new Node;
node.type = Node.Type.leaf;
node.data = cast(ulong)cast(void*)node;

How do you copy this?

Pinning avoids those problems, but it still involves scanning memory as if 
doing a collection.

Jan 07 2019

Steven Schveighoffer <schveiguy gmail.com> writes:

On 1/7/19 4:57 PM, Neia Neutuladh wrote:
 On Mon, 07 Jan 2019 13:06:22 -0800, H. S. Teoh wrote:
 Hmph, good point.  That sux. :-(  I suppose that's where copying/moving
 the object comes in -- migrate it to a different pool somehow so that we
 can avoid stop-the-world.

 
 class Node
 {
    enum Type { leaf, intermediate }
    Type type;
    union { Node child; ulong data; }
 }
 auto node = new Node;
 node.type = Node.Type.leaf;
 node.data = cast(ulong)cast(void*)node;
 
 How do you copy this?

You copy the memory, then update the references to that memory.

Or you make the user explicitly copy, and disallow the cast.

 Pinning avoids those problems, but it still involves scanning memory as if
 doing a collection.

Even if you pin, if you cast the above from shared to unshared, you then 
have made it so the thread-local pool cannot be scanned independently.

If we get thread-local pools, and all scans have to be done on all 
thread local pools along with the shared pool, we haven't gained anything.

-Steve

Jan 08 2019

Neia Neutuladh <neia ikeran.org> writes:

On Tue, 08 Jan 2019 09:21:23 -0500, Steven Schveighoffer wrote:
 You copy the memory, then update the references to that memory.

Again, in the case of unions and void[], how do you determine if something 
is a reference to that memory instead of a false pointer?

Moving collectors *have* to be fully precise, and D doesn't allow a fully 
precise collector.

 Even if you pin, if you cast the above from shared to unshared, you then
 have made it so the thread-local pool cannot be scanned independently.

Because the reference to an object in the current thread might be stored 
inside an object allocated by a different thread's GC, and that might have 
happened without a cast. Fixing that would require redesigning shared.

 If we get thread-local pools, and all scans have to be done on all
 thread local pools along with the shared pool, we haven't gained
 anything.

We can shorten the stop-the-world phase to the size of shared memory. If 
you're only sharing a little data and have a decent number of threads, 
this would be a big improvement.

Jan 08 2019

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Tuesday, 8 January 2019 at 18:04:26 UTC, Neia Neutuladh wrote:
 We can shorten the stop-the-world phase to the size of shared 
 memory. If you're only sharing a little data and have a decent 
 number of threads, this would be a big improvement.

Pointer storage isn't limited to shared memory?

Jan 08 2019

Neia Neutuladh <neia ikeran.org> writes:

On Wed, 09 Jan 2019 00:24:14 +0000, Ola Fosheim Grøstad wrote:
 On Tuesday, 8 January 2019 at 18:04:26 UTC, Neia Neutuladh wrote:
 We can shorten the stop-the-world phase to the size of shared memory.
 If you're only sharing a little data and have a decent number of
 threads, this would be a big improvement.

 
 Pointer storage isn't limited to shared memory?

1. Stop all the threads.
2. Scan shared memory: everything that's been allocated as shared, 
everything that's been explicitly casted to shared, and everything 
reachable from those objects. Use that to mark things in the current 
thread's local GC heap as reachable.
3. Resume every thread but this one.
4. Run mark/sweep on the current thread's heap.
5. Resume this thread.

If you have a small amount of shared memory, the stop-the-world phase would 
be pretty short.

As an alternative, this could hijack all the threads to run a parallel 
mark/sweep, where each individual thread can resume as soon as its own 
local mark/sweep finishes. If you have a worker thread that deals with 
very few pointers, it can resume very quickly after the shared mark/sweep 
phase.

Jan 08 2019

Neia Neutuladh <neia ikeran.org> writes:

On Mon, 07 Jan 2019 11:52:18 -0800, H. S. Teoh wrote:
 I think what Neia has in mind, which he mentioned in another post, is to
 have the druntime function simply pin the object, whichever pool it
 belongs to.  The idea being to tell the respective thread-local GC "this
 object might be referenced by another thread, do not collect or move".

She, not he.

Thinking about this some more, this strategy isn't as cheap or easy as I 
was hoping. Let's say you're using fearless:

auto a = gcExclusive!Node;
spawn({
  {
    auto tmp = a.lock;
    tmp.parent = new Object;
  }
  GC.threadLocal.collect();
  GC.threadLocal.minimize();
  {
    auto tmp = a.lock;
    writeln(tmp.parent.toString);
  }
});

Where do we insert the runtime call to mark a.parent as shared? Fearless 
would have to:
1. Acquire the mutex
2. Tell the current thread's GC to add the locked object as a root
3. Run your code
4. Tell the thread's GC to pin everything it can find from `a` (by casting 
back to shared)
5. Release the mutex

This effectively incurs a mark (but not sweep) cycle every time you cast 
to shared. If it were just pinning one object, that would be a lot easier 
to sell.

Jan 07 2019

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Monday, January 7, 2019 1:48:04 PM MST Neia Neutuladh via Digitalmars-d 
wrote:
 On Mon, 07 Jan 2019 11:52:18 -0800, H. S. Teoh wrote:
 I think what Neia has in mind, which he mentioned in another post, is to
 have the druntime function simply pin the object, whichever pool it
 belongs to.  The idea being to tell the respective thread-local GC "this
 object might be referenced by another thread, do not collect or move".

 She, not he.

 Thinking about this some more, this strategy isn't as cheap or easy as I
 was hoping. Let's say you're using fearless:

 auto a = gcExclusive!Node;
 spawn({
   {
     auto tmp = a.lock;
     tmp.parent = new Object;
   }
   GC.threadLocal.collect();
   GC.threadLocal.minimize();
   {
     auto tmp = a.lock;
     writeln(tmp.parent.toString);
   }
 });

 Where do we insert the runtime call to mark a.parent as shared? Fearless
 would have to:
 1. Acquire the mutex
 2. Tell the current thread's GC to add the locked object as a root
 3. Run your code
 4. Tell the thread's GC to pin everything it can find from `a` (by casting
 back to shared)
 5. Release the mutex

 This effectively incurs a mark (but not sweep) cycle every time you cast
 to shared. If it were just pinning one object, that would be a lot easier
 to sell.

Casting between shared and thread-local doesn't have to be free, but it
cannot be expensive, because unless you're using atomics, the way that it's
designed pretty much requires that you cast away shared (after locking a
mutex to protect the object) in order to operate on it. So, if casting away
shared is expensive, then code using shared gets screwed.

Also, there isn't anything stopping a programmer from assigning a
thread-local reference to something inside an object that has had shared
cast away, and there likely isn't any reason for the programmer to cast
anything to shared in that scenario. They'd just get rid of the thread-local
reference to the shared object before releasing the mutex, meaning that the
runtime couldn't even see that what had been assigned to one of its
internals had gone from being treated as thread-local to shared. Having
objects go from shared to thread-local in such code could be possible as
well if a reference is taken from the object which has had shared cast away
and then stored as thread-local. In fact that sort of scenario is pretty
much exactly what would happen with something like a consumer-producer
queue. Programmers dealing with shared need to be aware of such things and
deal with them appropriately in order to avoid having problems (which is
part of why casting away shared has to be  trusted), but it's perfectly
legal so long as no shared object is ever treated as thread-local when it
can actually be operated on by multiple threads, and no thread-local objects
end up on multiple threads at the same time. Given that sort of situation, I
don't see how we can have the GC accurately track whether objects are
thread-local or shared. Casting is just too blunt an instrument and allows
too much.

- Jonathan M Davis

Jan 07 2019

Neia Neutuladh <neia ikeran.org> writes:

On Mon, 07 Jan 2019 14:53:50 -0700, Jonathan M Davis wrote:
 Given that sort of situation, I don't see how we can have the
 GC accurately track whether objects are thread-local or shared. Casting
 is just too blunt an instrument and allows too much.

This is exactly the situation I brought up in the post you're replying to. 
I explained what the solution is in the post you just replied to. That was 
in fact the entire point of that post.

It requires a number of careful steps. It's not automatic. It's *mostly* 
automatic, and you can wrap the rest in a library. But it's expensive and 
it makes it easy to write incorrect code.

To review:

When you cast to shared, the thread-local GC pins the thing as shared. It 
also finds every bit of its memory that you can reach from that object, the 
same way as if it were doing a mark/sweep collection, and *that* is also 
pinned as shared.

When you are modifying an object that's had shared cast away, you need to 
tell the GC not to run, or you need to pin objects-that-will-become-shared 
temporarily (for instance, by keeping local references to them).

When you are done modifying an object that's had shared cast away, you 
need to cast it back to shared.

This is not a good solution, so D isn't getting a thread-local GC that 
eliminates stop-the-world unless someone else comes up with a cleverer 
solution. On the other hand, it doesn't have a cost for casting away 
shared as such; casting shared(T) to const(T) is totally free.

Jan 07 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Jan 07, 2019 at 10:27:15PM +0000, Neia Neutuladh via Digitalmars-d
wrote:
[...]
 To review:
 
 When you cast to shared, the thread-local GC pins the thing as shared.
 It also finds every bit of its memory that you can reach from that
 object, the same way as if it were doing a mark/sweep collection, and
 *that* is also pinned as shared.
 
 When you are modifying an object that's had shared cast away, you need
 to tell the GC not to run, or you need to pin
 objects-that-will-become-shared temporarily (for instance, by keeping
 local references to them).

This step is too easy to get wrong.  Once you cast shared away, the type
system can no longer help you identify it as shared (or "once was
shared").  As far as the compiler is concerned, assigning a pointer to a
thread-local object to a shared pointer that has been cast into a
thread-local pointer, is identical to assigning one thread-local pointer
to another.  So you'll run into the problem Steven described, without
any warning whatsoever.


 When you are done modifying an object that's had shared cast away, you
 need to cast it back to shared.

This is also too easy to miss, because under current D, such a cast is
redundant and not done in practice. It also looks weird:

	class C { ... }
	shared(C) myObj;

	mutex.acquire();
	C tmp = cast(C) myObj;
	tmp.doStuff();
	myObj = cast(shared(C)) tmp; // people will not think to do this
	mutex.release();


 This is not a good solution, so D isn't getting a thread-local GC that
 eliminates stop-the-world unless someone else comes up with a cleverer
 solution. On the other hand, it doesn't have a cost for casting away
 shared as such; casting shared(T) to const(T) is totally free.

Yeah, shared is the fly in the ointment that prevents us from having a
thread-local GC.

Perhaps there can be some way of detecting whether the code casts away
shared?  I mean if your code never actually casts away shared, e.g. by
using messaging or whatever for thread communication, then you won't run
into this problem, and you could safely use a thread-local GC without
any ill-effects.  Though all it takes is for *one* cast to exist
*somewhere* and your code becomes memory-unsafe.  So it cannot be
enabled by default.


T

-- 
They say that "guns don't kill people, people kill people." Well I think
the gun helps. If you just stood there and yelled BANG, I don't think
you'd kill too many people. -- Eddie Izzard, Dressed to Kill

Jan 07 2019

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Monday, January 7, 2019 3:27:15 PM MST Neia Neutuladh via Digitalmars-d 
wrote:
 On Mon, 07 Jan 2019 14:53:50 -0700, Jonathan M Davis wrote:
 Given that sort of situation, I don't see how we can have the
 GC accurately track whether objects are thread-local or shared. Casting
 is just too blunt an instrument and allows too much.

 This is exactly the situation I brought up in the post you're replying to.
 I explained what the solution is in the post you just replied to. That
 was in fact the entire point of that post.

Well, then I clearly read over it way too quickly.

 It requires a number of careful steps. It's not automatic. It's *mostly*
 automatic, and you can wrap the rest in a library. But it's expensive and
 it makes it easy to write incorrect code.

 To review:

 When you cast to shared, the thread-local GC pins the thing as shared. It
 also finds every bit of its memory that you can reach from that object,
 the same way as if it were doing a mark/sweep collection, and *that* is
 also pinned as shared.

 When you are modifying an object that's had shared cast away, you need to
 tell the GC not to run, or you need to pin objects-that-will-become-shared
 temporarily (for instance, by keeping local references to them).

 When you are done modifying an object that's had shared cast away, you
 need to cast it back to shared.

 This is not a good solution, so D isn't getting a thread-local GC that
 eliminates stop-the-world unless someone else comes up with a cleverer
 solution. On the other hand, it doesn't have a cost for casting away
 shared as such; casting shared(T) to const(T) is totally free.

Yeah, such a solution wouldn't fly. Basically, you're talking about having
to have a way to tell the GC that you're moving stuff between threads, and
that would be so error prone that it's not even funny. It's already
problematic enough to get code that deals with sharing data across threads
right as it is.

If we could solve the forking problem on Windows so that we could actually
have a cross-platform concurrent GC like the Linux one that Sociomantic has
used, then that would likely give similar benefits (if not better) without
having to muck with the type system. I forget exactly what the
stop-the-world pause times were, but they were pretty low. If a thread
really couldn't afford to be stopped at all, it would still need to be
separate from the GC, just like now, but that would be true of a solution
that involved thread-local heaps as well.

In any case, it's issues like these which definitely make it much harder to
drastically improve D's GC. We have much worse constraints to work under
than languages like Java, and we don't have the same kind of manpower trying
to improve the situation. But at least D is set up in a way that works quite
well with minimizing heap allocations - especially with the idioms that are
typical in idiomatic D.

- Jonathan M Davis

Jan 07 2019

Paulo Pinto <pjmlp progtools.org> writes:

On Tuesday, 8 January 2019 at 04:34:11 UTC, Jonathan M Davis 
wrote:
 On Monday, January 7, 2019 3:27:15 PM MST Neia Neutuladh via 
 Digitalmars-d wrote:
 On Mon, 07 Jan 2019 14:53:50 -0700, Jonathan M Davis wrote:
 Given that sort of situation, I don't see how we can have the
 GC accurately track whether objects are thread-local or 
 shared. Casting
 is just too blunt an instrument and allows too much.

 This is exactly the situation I brought up in the post you're 
 replying to. I explained what the solution is in the post you 
 just replied to. That was in fact the entire point of that 
 post.

 Well, then I clearly read over it way too quickly.

 It requires a number of careful steps. It's not automatic. 
 It's *mostly* automatic, and you can wrap the rest in a 
 library. But it's expensive and it makes it easy to write 
 incorrect code.

 To review:

 When you cast to shared, the thread-local GC pins the thing as 
 shared. It also finds every bit of its memory that you can 
 reach from that object, the same way as if it were doing a 
 mark/sweep collection, and *that* is also pinned as shared.

 When you are modifying an object that's had shared cast away, 
 you need to tell the GC not to run, or you need to pin 
 objects-that-will-become-shared temporarily (for instance, by 
 keeping local references to them).

 When you are done modifying an object that's had shared cast 
 away, you need to cast it back to shared.

 This is not a good solution, so D isn't getting a thread-local 
 GC that eliminates stop-the-world unless someone else comes up 
 with a cleverer solution. On the other hand, it doesn't have a 
 cost for casting away shared as such; casting shared(T) to 
 const(T) is totally free.

 Yeah, such a solution wouldn't fly. Basically, you're talking 
 about having to have a way to tell the GC that you're moving 
 stuff between threads, and that would be so error prone that 
 it's not even funny. It's already problematic enough to get 
 code that deals with sharing data across threads right as it is.

 If we could solve the forking problem on Windows so that we 
 could actually have a cross-platform concurrent GC like the 
 Linux one that Sociomantic has used, then that would likely 
 give similar benefits (if not better) without having to muck 
 with the type system. I forget exactly what the stop-the-world 
 pause times were, but they were pretty low. If a thread really 
 couldn't afford to be stopped at all, it would still need to be 
 separate from the GC, just like now, but that would be true of 
 a solution that involved thread-local heaps as well.

 In any case, it's issues like these which definitely make it 
 much harder to drastically improve D's GC. We have much worse 
 constraints to work under than languages like Java, and we 
 don't have the same kind of manpower trying to improve the 
 situation. But at least D is set up in a way that works quite 
 well with minimizing heap allocations - especially with the 
 idioms that are typical in idiomatic D.

 - Jonathan M Davis

Yet, all the GC enabled languages happened to beat Swift with its 
reference counting, on the implementation of an high performance 
network userspace driver.

"Safe and Secure Drivers in High-Level Languages

OCaml"

https://media.ccc.de/v/35c3-9670-safe_and_secure_drivers_in_high-level_languages

The professor that organized the research thesis didn't even 
considered D for the project, and he had plenty to choose from.

And MIT is doing research with writing POSIX kernels in Go, 
https://github.com/mit-pdos/biscuit


for low level programming, and is getting all the love from game 
developers fed up with C++, including some very well known ones.

The theme that D's GC cannot be improved won't take the language 
very far.

Jan 08 2019

=?UTF-8?B?TcOhcmNpbw==?= Martins <marcioapm gmail.com> writes:

On Tuesday, 8 January 2019 at 08:44:35 UTC, Paulo Pinto wrote:
 The theme that D's GC cannot be improved won't take the 
 language very far.

I share the same sentiment.
Every GC conversation ends up into the performance of write 
barriers, shared, comparison with other languages, or some other 
argument that is not moving the bar at all.

Is the current implementation of simple MS not improvable in any 
way?

Jan 08 2019

Reimer Behrends <behrends gmail.com> writes:

On Tuesday, 8 January 2019 at 09:31:09 UTC, Márcio Martins wrote:
 Is the current implementation of simple MS not improvable in 
 any way?

I'd say it is. I've written up a small benchmark here:

https://gist.github.com/rbehrends/3d4ab87d34e79722272189abc68eee00

Options:

* D with its native GC, compiled with ldc 1.13.0.
* jemalloc, version 5.1.0.
* The Boehm GC, version 8.0.2 with sequential marking.
* The Boehm GC with parallel marking.
* The Boehm GC using incremental GC.
* The Boehm GC with explicit deallocation instead of GC.
* System malloc on macOS 10.12.6.

This is a version of the "binary trees" benchmark [1], as it was 
the one allocation-oriented benchmark readily available and 
easily portable between languages. Because it is a very 
specialized benchmark (it allocates a *lot* of small fixed-size 
objects), I'll caution against generalizing its results, but it 
can give us some insights all the same.

Here's a sample run with several GC options enabled, run on a 
2.6GHz Core i7 with four cores; code that doesn't do 
allocation/collection (i.e. checksum()) accounts for less than 
two seconds of runtime. Most actual tuned non-functional programs 
will spend only 10%-20% doing allocation and collection, so this 
is not terribly representative of real applications and 
exaggerates differences in performance.

$ make benchmark DEPTH=21

/usr/bin/time ./btree-d 21 >/dev/null
        34.24 real        33.96 user         0.21 sys

/usr/bin/time ./btree-jemalloc 21 >/dev/null
        21.62 real        21.35 user         0.22 sys

GC_MARKERS=4 /usr/bin/time ./btree-gc 21 >/dev/null
         9.39 real        12.02 user         0.18 sys

GC_MARKERS=1 /usr/bin/time ./btree-gc 21 >/dev/null
        11.42 real        11.34 user         0.06 sys

GC_MARKERS=1 /usr/bin/time ./btree-gc-free 21 >/dev/null
        13.12 real        13.05 user         0.05 sys

/usr/bin/time ./btree-gc-inc 21 >/dev/null
        20.74 real        17.21 user         6.29 sys

/usr/bin/time ./btree-sysmalloc 21 >/dev/null
        67.73 real        66.97 user         0.74 sys

The fastest version is the parallel-mark version of the 
Boehm-Demers-Weiser GC, which beats out all the competitors. The 
rear is brought up by system malloc() (to nobody's surprise, 
standard malloc()/free() on a Mac is dog slow).

The Boehm GC benchmarks are run with interior pointers disabled 
in order to prevent extra padding for the small objects that 
would blow up the heap size, but even with interior pointers 
enabled, the relative performance doesn't change all that much; 
the single-threaded run clocks in at around 16 seconds instead, 
still twice as fast as the D GC, and the version with parallel 
marking finishes in around 13 seconds. It is worth noting that 
the Boehm GC still does go through all the conservative marking 
machinery and does check interior pointers from the stack 
regardless, so a precise or semi-precise collector may still be 
able to gain additional performance.

Parallel marking gives us a noticeable speedup in wall clock time 
and a really big one in GC pauses (not displayed here) at the 
expense of using more CPU time.

Incremental collection is super expensive in comparison, due to 
the cost of mprotect(), for code that technically would not 
require write barriers. It still is on par with jemalloc() for 
throughput.

In terms of GC pause times, we get the following for the Boehm GC:

$ time GC_MARKERS=4 ./btree-gc 23 | tail -5
       88 collections
    0.056 seconds/collection
    0.080 max seconds/collection
      818 MB heap size
      306 MB free bytes

This is again with parallel marking (four marker threads) and 
using a larger heap. The collection times are wall clock times, 
based on gettimeofday(). Note that most heaps will contain at 
least some non-pointer data, ours is almost 100% pointers. With 
single-threaded marking, the maximum collection time more than 
triples to nearly .3 seconds.

As an interesting aside, the maximum pause time for explicitly 
deleting trees in the benchmark is much larger than the GC pause 
time (nearly a second for jemalloc()) and would remain larger 
even if the delete tree operation were parallelized. This is a 
good reminder that cascading deletes in RAII can also be 
problematic for pause times and generally require tradeoffs 
(deferred/concurrent deletion, sacrificing timely destructor 
calls) in order to work in real time systems.


[1] 
https://benchmarksgame-team.pages.debian.net/benchmarksgame/performance/binarytrees.html

Jan 08 2019

Francesco Mecca <me francescomecca.eu> writes:

On Tuesday, 8 January 2019 at 04:34:11 UTC, Jonathan M Davis 
wrote:
 If we could solve the forking problem on Windows so that we 
 could actually have a cross-platform concurrent GC like the 
 Linux one that Sociomantic has used, then that would likely 
 give similar benefits (if not better) without having to muck 
 with the type system. I forget exactly what the stop-the-world 
 pause times were, but they were pretty low. If a thread really 
 couldn't afford to be stopped at all, it would still need to be 
 separate from the GC, just like now, but that would be true of 
 a solution that involved thread-local heaps as well.

I don't want to talk about my work (mentored by Leandro of 
Sociomantic) until it is done, but the concurrent GC has been 
mentioned so many times in this thread that I can't refrain to 
show current results.

This are the pause times for Dustmite:
Not forking GC (Linux):
    Collection time  Stop-the-world time
0         0.000139             0.000137
1         0.000172             0.000155
2         0.001917             0.001033
3         0.013672             0.007031
4         0.028831             0.017149
5         0.073201             0.042553
6         0.175916             0.103788
7         0.271328             0.179002
8         0.607061             0.375101

Forking GC (Linux):
     Collection time  Stop-the-world time
0          0.000128             0.000125
1          0.000025             0.000125
2          0.000163             0.000155
3          0.003665             0.000155
4          0.000569             0.000564
5          0.023067             0.000564
6          0.001888             0.001882
7          0.062640             0.001882
8          0.004437             0.004429
9          0.118210             0.004429
10         0.009619             0.009613
11         0.251923             0.009613


As you can see the pause times are way lower.
This is true for many other small programs used for benchmarking.
Back in the old TangoRT times the concurrent GC even improved the 
program execution time but this is no longer true mainly because 
Martin Nowak implemented a similar strategy for the eager 
allocation of pools.

I was already contacted by Mike in order to report the status of 
the project for the SAOC so I hope we can disclose more in the 
upcoming weeks.

This post highlights a possible way to implement the forking 
behaviour on Windows as well, but I haven't touched a Windows box 
in years.
https://rainers.github.io/visuald/druntime/concurrentgc.html

I would like to see a thread local GC in the future as well but I 
still have to understand the weak points of `shared` and how it 
can be reshaped.

Jan 08 2019

Sebastiaan Koppe <mail skoppe.eu> writes:

On Tuesday, 8 January 2019 at 09:39:07 UTC, Francesco Mecca wrote:
 As you can see the pause times are way lower.
 This is true for many other small programs used for 
 benchmarking.
 Back in the old TangoRT times the concurrent GC even improved 
 the program execution time but this is no longer true mainly 
 because Martin Nowak implemented a similar strategy for the 
 eager allocation of pools.

 I was already contacted by Mike in order to report the status 
 of the project for the SAOC so I hope we can disclose more in 
 the upcoming weeks.

That is just plain awesome. Good to see work is being done on 
that front.

Jan 09 2019

Walter Bright <newshound2 digitalmars.com> writes:

On 1/5/2019 2:05 PM, Manu wrote:
 How much truth is in here?

There's a lot of triggering on the word "GC". At some level, it doesn't matter 
how good or bad the GC is, it's "GC is bad". Even if your code never calls the
GC.


 What is this business about write barriers making the GC fast? Is that
 actually fact, or are there other hurdles?

Here's an explanation:

https://stackoverflow.com/questions/19154607/how-actually-card-table-and-writer-barrier-works

There are many more if you google "gc write barrier".

The penalty, of course, is extra code gets run for every write through a
pointer.

The benefits exceed the penalties for a language where all dynamic allocation
is 
done via the GC. But for D, where quite a lot of it is not, it isn't so clear. 
What is clear is that if you're going to compare speed with C++, having those 
write barriers in there is going to be a serious penalty, because they'll be 
there even if you don't use the GC.

 Where's the ARC stuff? What happened to opAddRef/opRelease?

Andrei, Razvan, and I have decided that the most pragmatic way forward to 
support reference counting is to support copy constructors ala C++. C++ is 
clearly successful with this approach, and Andrei/Razvan concluded that D's 
postblit just doesn't cut it. Razvan has a proposal for copy constructors, and 
an implementation PR that has fallen a bit behind his latest proposal.

Jan 05 2019

Manu <turkeyman gmail.com> writes:

On Sat, Jan 5, 2019 at 7:45 PM Walter Bright via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
On 1/5/2019 2:05 PM, Manu wrote:
How much truth is in here?

There's a lot of triggering on the word "GC". At some level, it doesn't matter
how good or bad the GC is, it's "GC is bad". Even if your code never calls the
GC.

What is this business about write barriers making the GC fast? Is that
actually fact, or are there other hurdles?

Here's an explanation:

https://stackoverflow.com/questions/19154607/how-actually-card-table-and-writer-barrier-works

There are many more if you google "gc write barrier".

The penalty, of course, is extra code gets run for every write through a
pointer.

The benefits exceed the penalties for a language where all dynamic allocation
is
done via the GC. But for D, where quite a lot of it is not, it isn't so clear.
What is clear is that if you're going to compare speed with C++, having those
write barriers in there is going to be a serious penalty, because they'll be
there even if you don't use the GC.

> Where's the ARC stuff? What happened to opAddRef/opRelease?

Andrei, Razvan, and I have decided that the most pragmatic way forward to
support reference counting is to support copy constructors ala C++. C++ is
clearly successful with this approach, and Andrei/Razvan concluded that D's
postblit just doesn't cut it. Razvan has a proposal for copy constructors, and
an implementation PR that has fallen a bit behind his latest proposal.

Hmnm, that's pretty disappointing... as a C++ user who has been doing
C++ ref counting for years, I would say that, while maybe it is
*kinda* successful, it's not at all satisfying, and it's one major
reason I've been pushing it in D for the last 8-9 years.
Ref counting in C++ is a sloppy hack. I really just want the compiler
to emit opInc and opDec appropriately on either sides of a reference
assignment, and let me take it from there.
Also, with escape analysis, we can elide such calls very effectively,
and we can't do that using the C++ strategy.

I think it's disappointing to embrace C++'s hacky 'solution', we could
do better.

Jan 05 2019

Walter Bright <newshound2 digitalmars.com> writes:

On 1/5/2019 11:48 PM, Manu wrote:
 Hmnm, that's pretty disappointing... as a C++ user who has been doing
 C++ ref counting for years, I would say that, while maybe it is
 *kinda* successful, it's not at all satisfying, and it's one major
 reason I've been pushing it in D for the last 8-9 years.
 Ref counting in C++ is a sloppy hack. I really just want the compiler
 to emit opInc and opDec appropriately on either sides of a reference
 assignment, and let me take it from there.
 Also, with escape analysis, we can elide such calls very effectively,
 and we can't do that using the C++ strategy.
 
 I think it's disappointing to embrace C++'s hacky 'solution', we could
 do better.

So far, we've failed at every attempt at it.

Jan 06 2019

Manu <turkeyman gmail.com> writes:

On Sun, Jan 6, 2019 at 1:25 AM Walter Bright via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On 1/5/2019 11:48 PM, Manu wrote:
 Hmnm, that's pretty disappointing... as a C++ user who has been doing
 C++ ref counting for years, I would say that, while maybe it is
 *kinda* successful, it's not at all satisfying, and it's one major
 reason I've been pushing it in D for the last 8-9 years.
 Ref counting in C++ is a sloppy hack. I really just want the compiler
 to emit opInc and opDec appropriately on either sides of a reference
 assignment, and let me take it from there.
 Also, with escape analysis, we can elide such calls very effectively,
 and we can't do that using the C++ strategy.

 I think it's disappointing to embrace C++'s hacky 'solution', we could
 do better.

 So far, we've failed at every attempt at it.

Why did opInc/opDec fail? It was never available in any experimental
form, I was never able to try it out. I've been waiting eagerly for
almost a decade...

Jan 06 2019

Walter Bright <newshound2 digitalmars.com> writes:

On 1/6/2019 1:42 AM, Manu wrote:
 Why did opInc/opDec fail? It was never available in any experimental
 form, I was never able to try it out. I've been waiting eagerly for
 almost a decade...

There was a looong thread about it. It wasn't implemented before major problems 
were found for it.

See:

https://digitalmars.com/d/archives/digitalmars/D/draft_proposal_for_ref_counting_in_D_211885.html

There's more on the Dlang-study mailing list:

Dlang-study puremagic.com
http://lists.puremagic.com/cgi-bin/mailman/listinfo/dlang-study

Jan 06 2019

Russel Winder <russel winder.org.uk> writes:

On Sun, 2019-01-06 at 01:22 -0800, Walter Bright via Digitalmars-d wrote:
 On 1/5/2019 11:48 PM, Manu wrote:
 Hmnm, that's pretty disappointing... as a C++ user who has been doing
 C++ ref counting for years, I would say that, while maybe it is
 *kinda* successful, it's not at all satisfying, and it's one major
 reason I've been pushing it in D for the last 8-9 years.
 Ref counting in C++ is a sloppy hack. I really just want the compiler
 to emit opInc and opDec appropriately on either sides of a reference
 assignment, and let me take it from there.
 Also, with escape analysis, we can elide such calls very effectively,
 and we can't do that using the C++ strategy.
=20
 I think it's disappointing to embrace C++'s hacky 'solution', we could
 do better.

=20
 So far, we've failed at every attempt at it.

This is not an excuse to stop trying!

In this, Java and Go are excellent examples: someone comes up with the best
that can be done, and then someone finds a new and usually better way of do=
ing
things.

I don't have any specific ideas, so I can't put my activity where my mouth =
is.
Sorry.

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk

Jan 07 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Jan 07, 2019 at 07:18:02PM +0000, Russel Winder via Digitalmars-d wrote:
 On Sun, 2019-01-06 at 01:22 -0800, Walter Bright via Digitalmars-d wrote:
 On 1/5/2019 11:48 PM, Manu wrote:
 Hmnm, that's pretty disappointing... as a C++ user who has been
 doing C++ ref counting for years, I would say that, while maybe it
 is *kinda* successful, it's not at all satisfying, and it's one
 major reason I've been pushing it in D for the last 8-9 years.
 Ref counting in C++ is a sloppy hack. I really just want the
 compiler to emit opInc and opDec appropriately on either sides of
 a reference assignment, and let me take it from there.  Also, with
 escape analysis, we can elide such calls very effectively, and we
 can't do that using the C++ strategy.
 
 I think it's disappointing to embrace C++'s hacky 'solution', we
 could do better.

 
 So far, we've failed at every attempt at it.

 
 This is not an excuse to stop trying!

[...]

I think you may have misunderstood.  Read the "study" mailing lists that
Walter referred to.  You'll see some concrete examples of tricky
situations that may not be immediately obvious from a naive conception
of ARC.  It's not that we've stopped trying, but that we've run out of
good ideas and nobody has stepped up offering better ones.

After reading through said mailing lists, I think I better understand
now why Walter is pushing for dip25 and dip1000.  They tackle essential
issues related to proper escape analysis, which is a requirement for RC
counter bump elisions that won't cause UB or other nasty problems.  In
themselves they won't give you RC just yet, but they pave the way and
lay some of the essential foundations that an RC implementation will
require.


T

-- 
Never step over a puddle, always step around it. Chances are that whatever made
it is still dripping.

Jan 07 2019

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Sunday, 6 January 2019 at 07:48:03 UTC, Manu wrote:
 Ref counting in C++ is a sloppy hack. I really just want the 
 compiler
 to emit opInc and opDec appropriately on either sides of a 
 reference
 assignment, and let me take it from there.

I believe C++ compilers are allowed to do this for shared_ptr, 
but you would need global optimization for it to make much sense.

But shared_ptr is not frequently used in performance sensitive 
loops, because it is slow, so it doesn't make a lot of sense for 
compiler implementors to spend  that much time on it...

Jan 08 2019

Jacob Carlborg <doob me.com> writes:

On 2019-01-06 04:42, Walter Bright wrote:

  > Where's the ARC stuff? What happened to opAddRef/opRelease?
 
 Andrei, Razvan, and I have decided that the most pragmatic way forward 
 to support reference counting is to support copy constructors ala C++. 
 C++ is clearly successful with this approach, and Andrei/Razvan 
 concluded that D's postblit just doesn't cut it. Razvan has a proposal 
 for copy constructors, and an implementation PR that has fallen a bit 
 behind his latest proposal.

Razvan has confirmed that copy constructors are only for structs [1]. 
The ARC (opAddRef/opRelease) proposal was for classes as well. So copy 
constructors are not a substitute as defined now.

[1] https://forum.dlang.org/post/hqqszotxfwauczupjmez forum.dlang.org

-- 
/Jacob Carlborg

Jan 07 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Jan 07, 2019 at 08:31:06PM +0100, Jacob Carlborg via Digitalmars-d
wrote:
 On 2019-01-06 04:42, Walter Bright wrote:

[...]
 Andrei, Razvan, and I have decided that the most pragmatic way
 forward to support reference counting is to support copy
 constructors ala C++.  C++ is clearly successful with this approach,
 and Andrei/Razvan concluded that D's postblit just doesn't cut it.
 Razvan has a proposal for copy constructors, and an implementation
 PR that has fallen a bit behind his latest proposal.

 
 Razvan has confirmed that copy constructors are only for structs [1].
 The ARC (opAddRef/opRelease) proposal was for classes as well. So copy
 constructors are not a substitute as defined now.

[...]

I think the idea is probably to use struct wrappers to implement RC,
i.e., you'd have a struct wrapping a class reference of the object it's
managing.  As long as the struct can have semantics compatible with RC,
it can be used to implement RC. At least in theory.

My suspicion, though, is that at some point further language support is
going to be needed, because of interaction with D's const system. The RC
count needs to be mutable, yet the managed object must be allowed to be
const or immutable in a way that doesn't require onerous, incomplete
workarounds (like using a Const template instead of just typing `const
RC!MyClass` -- which won't work if you have the RC'd object inside
another aggregate that also needs const/immutable qualifiers).

Requiring that RC objects can only be mutable is going to lead to more
dissatisfaction about RC in D.


T

-- 
The easy way is the wrong way, and the hard way is the stupid way. Pick one.

Jan 07 2019

Jacob Carlborg <doob me.com> writes:

On 2019-01-07 20:46, H. S. Teoh wrote:

 I think the idea is probably to use struct wrappers to implement RC,
 i.e., you'd have a struct wrapping a class reference of the object it's
 managing.  As long as the struct can have semantics compatible with RC,
 it can be used to implement RC. At least in theory.

That will be quite ugly. Will these struct wrappers be covariant if the 
types they wrap are covariant? I mean, is that possible to implement?

-- 
/Jacob Carlborg

Jan 08 2019

bioinfornatics <bioinfornatics fedoraproject.org> writes:

On Sunday, 6 January 2019 at 03:42:05 UTC, Walter Bright wrote:
On 1/5/2019 2:05 PM, Manu wrote:
How much truth is in here?

There's a lot of triggering on the word "GC". At some level, it
doesn't matter how good or bad the GC is, it's "GC is bad".
Even if your code never calls the GC.

What is this business about write barriers making the GC fast?
Is that
actually fact, or are there other hurdles?

Here's an explanation:

https://stackoverflow.com/questions/19154607/how-actually-card-table-and-writer-barrier-works

There are many more if you google "gc write barrier".

The penalty, of course, is extra code gets run for every write
through a pointer.

The benefits exceed the penalties for a language where all
dynamic allocation is done via the GC. But for D, where quite a
lot of it is not, it isn't so clear. What is clear is that if
you're going to compare speed with C++, having those write
barriers in there is going to be a serious penalty, because
they'll be there even if you don't use the GC.

Where's the ARC stuff? What happened to opAddRef/opRelease?

Andrei, Razvan, and I have decided that the most pragmatic way
forward to support reference counting is to support copy
constructors ala C++. C++ is clearly successful with this
approach, and Andrei/Razvan concluded that D's postblit just
doesn't cut it. Razvan has a proposal for copy constructors,
and an implementation PR that has fallen a bit behind his
latest proposal.

Topics: get a better understanding of D's application bottlenecks
It could be interesting to be able to analyze trace execution and
gc use with tools designed for. As example the process minning (a
scientific field) seem to be interesting in order to understand
what happen and where are the bottllenecks.
With such tools people will blame the GC only when he is
responsible. So a tool which would be able to generate an event
trace to the file format xes
(http://www.processmining.org/logs/xes) should be awesome.
Indeed they are somme tools such as Prom
(http://www.promtools.org/doku.php) which are able to analyse and
represent it. The developpers will get a better understanding
with the use with such (animated) representation:
http://fluxicon.com/blog/wp-content/uploads/2013/11/Disco-Process-
ining-Animation.gif .

Best regards

Jan 08 2019

Walter Bright <newshound2 digitalmars.com> writes:

Some more on write barriers with LLVM:

https://llvm.org/docs/GarbageCollection.html

Jan 05 2019

Neia Neutuladh <neia ikeran.org> writes:

On Sat, 05 Jan 2019 20:34:30 -0800, H. S. Teoh wrote:
 On Sat, Jan 05, 2019 at 11:12:52PM +0000, Neia Neutuladh via
 Digitalmars-d wrote:
 On Sat, 05 Jan 2019 14:05:19 -0800, Manu wrote:
 I'm somewhere between a light GC user and a  nogc user, and I don't
 really know much about where we're at, or much about start-of-the-art
 GC in general.

 
 I use the GC unabashedly and only try to make sure I reuse memory when
 it's reasonably convenient. I've also looked into GC a bit.

 
 I also use the GC freely, and only bother with GC optimization when my
 profiler shows that there's an actual problem.
 
 As I've said numerous times before, unless you're working on extremely
 time-sensitive code like real-time applications or 3D game engines, D's
 GC usually does not cause much noticeable difference. Unless you do
 something extreme like allocate tens of millions of small strings (or
 other small objects) per second, or allocate huge objects rapidly and
 expect unreferenced memory to be quickly reused.

String allocation is one area where D makes it absurdly easy to reuse 
existing memory while other languages force you to go through hoops.

One of the reasons I know as much as I do about Unicode is that I ported 

long and allocating 17000 times as much memory. The main culprits were 
UTF-16 and copy-on-substring. To resolve that, I made my own string type. 
(And used Mono's ahead-of-time compilation and did a couple other things.)

 (You'd have a stop-the-world phase that happened infrequently. Casting
 something to shared would pin it in the thread it was allocated from.)

 
 How does this solve the problem of shared, though?  The last time I
 checked, casting to/from shared is the main showstopper for a
 thread-local GC.

The issue with a thread-local GC and casting to shared is that the GC for 
thread A doesn't know about references held by thread B. That means thread 
A might collect and reuse that shared object.

If we had a runtime call in the cast to shared, thread A's GC can record 
that that object is possibly referenced from another thread. It won't 
collect it until a stop-the-world phase says it's not referenced.

This would give teeth to some of the undefined behavior we currently have.

 Now that I think of it, we could deal with pointers in unions the same
 way -- if the compiler detects it, then trigger conservative mode in the
 GC.

Right. In the past when people have looked at this problem, the idea was 
to treat anything that could possibly be a pointer as a pointer.

 With these two out of the way, a generational GC for D seems closer to
 the realm of possibility.

I'm not sure how much sense it makes to have a generational GC without 
write barriers.

Jan 05 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Sun, Jan 06, 2019 at 05:11:19AM +0000, Neia Neutuladh via Digitalmars-d
wrote:
 On Sat, 05 Jan 2019 20:34:30 -0800, H. S. Teoh wrote:

[...]
 One of the reasons I know as much as I do about Unicode is that I

 times as long and allocating 17000 times as much memory. The main
 culprits were UTF-16 and copy-on-substring. To resolve that, I made my
 own string type.  (And used Mono's ahead-of-time compilation and did a
 couple other things.)

Yeah, Walter's right on the money about copy-on-substring being a big


background cost that most people don't normally think about.


 (You'd have a stop-the-world phase that happened infrequently.
 Casting something to shared would pin it in the thread it was
 allocated from.)

 
 How does this solve the problem of shared, though?  The last time I
 checked, casting to/from shared is the main showstopper for a
 thread-local GC.

 
 The issue with a thread-local GC and casting to shared is that the GC
 for thread A doesn't know about references held by thread B. That
 means thread A might collect and reuse that shared object.
 
 If we had a runtime call in the cast to shared, thread A's GC can
 record that that object is possibly referenced from another thread. It
 won't collect it until a stop-the-world phase says it's not
 referenced.
 
 This would give teeth to some of the undefined behavior we currently
 have.

Ahh, I see what you mean.  Hmm, neat idea.  Might actually work!  Worth
exploring, I think.


 Now that I think of it, we could deal with pointers in unions the
 same way -- if the compiler detects it, then trigger conservative
 mode in the GC.

 
 Right. In the past when people have looked at this problem, the idea
 was to treat anything that could possibly be a pointer as a pointer.
 
 With these two out of the way, a generational GC for D seems closer
 to the realm of possibility.

 
 I'm not sure how much sense it makes to have a generational GC without
 write barriers.

True.  But it would allow at least a precise GC.  And perhaps a few
other GC improvements that are currently not possible.


T

-- 
Why waste time reinventing the wheel, when you could be reinventing the engine?
-- Damian Conway

Jan 05 2019

Neia Neutuladh <neia ikeran.org> writes:

On Sat, 05 Jan 2019 22:17:27 -0800, H. S. Teoh wrote:
 Yeah, Walter's right on the money about copy-on-substring being a big


 background cost that most people don't normally think about.

Interestingly, Java used to use slicing for substrings. This was changed 
(circa java 7, IIRC). The logic was that you might read in a large string 
and preserve a short substring of it for a long time, and that costs you a 
lot of extra memory.

I mean, you could just call `new String(substring)`, but apparently people 
weren't doing that and were confused at their memory usage.

Jan 05 2019

Walter Bright <newshound2 digitalmars.com> writes:

On 1/5/2019 11:03 PM, Neia Neutuladh wrote:
 Interestingly, Java used to use slicing for substrings. This was changed
 (circa java 7, IIRC). The logic was that you might read in a large string
 and preserve a short substring of it for a long time, and that costs you a
 lot of extra memory.

That's why DMD doesn't use slicing of the source file buffer for strings and 
identifiers. There's also a caching issue with it, as the slices will be spread 
out in memory rather than concentrated.

Jan 06 2019

Walter Bright <newshound2 digitalmars.com> writes:

On 1/5/2019 9:11 PM, Neia Neutuladh wrote:
 I'm not sure how much sense it makes to have a generational GC without
 write barriers.

Eh, I invented a technique in the 90's to do that. Then I read a paper about
it, 
it's called a "mostly copying" generational GC. All it does is pin the objects 
it isn't sure about (i.e. ambiguous pointers to them).

Jan 06 2019

Guillaume Piolat <first.last gmail.com> writes:

On Saturday, 5 January 2019 at 22:05:19 UTC, Manu wrote:
 How much truth is in here?

The truth of oft-repeated mantras?

http://www.infognition.com/blog/2014/the_real_problem_with_gc_in_d.html taught
us how small our heap should be to keep performance.

The situation has improved since with -betterC, -profile=gc, 
allocators,  nogc. Piece of knowledge too (avoid void[]).

The D GC doesn't seem to stop people from implementing fast 
systems.
When we disabled GC completely and went completely manual, we got 
zero speed improvement, only memory usage decrease. GC is 
surprinsingly affordable.

Jan 06 2019

Reimer Behrends <behrends gmail.com> writes:

On Saturday, 5 January 2019 at 22:05:19 UTC, Manu wrote:
 How much truth is in here?
 What is this business about write barriers making the GC fast? 
 Is that
 actually fact, or are there other hurdles?

First of all, stop-the-world GC gets an unnecessarily bad rap 
these
days. A high throughput stop-the-world GC can actually be pretty
competitive in terms of throughput, and latency may be lower than 
what
you'd expect. You probably aren't going to run AAA video games 
with one,
but there are many application domains where latency just doesn't 
matter
or doesn't matter as much; even normal interactive use (such as 
your
typical GUI application) often works plenty fine. Stop-the-world
collectors are more of a problem for a functional programming 
language
(or languages that exercise allocation-heavy functional idioms a 
lot),
but less so for imperative languages that have more control over
allocation frequency.

In general, a modern precise stop-the-world collector should be 
able to
handle 1GB worth of pointers in something like .2 seconds on 
modern
hardware, give or take (especially if you can leverage 
parallelism).
Given that much of the heap will generally be occupied by 
non-pointer
data in an imperative language, such as strings, bitmaps, or 
numerical
data, this is plenty for typical interactive applications (you 
will
often have interaction hiccups of comparable size from other 
system
sources) and reasonably written server-side software can use a
multi-process approach to limit the per-process heap size. For 
batch
programs and most command line tools, latency matters little, 
anyway.
Throughput-wise, you should be able to perform on par with 
something
like jemalloc, possibly faster, especially if you have compiler 
support
(to e.g. inline pool allocations and avoid unnecessary zeroing of 
newly
allocated memory if you statically know the object's size and 
layout).
You generally won't be able to beat specialized allocators for
throughput, though, just general purpose allocators.

Note that manual memory allocation is not a free lunch, either. 
As an
extreme case, normal reference counting is like having a very 
expensive
write barrier that even applies to stack frames. Manual 
allocation only
really wins out for throughput if you can use optimized allocation
techniques (such as a pool/arena allocator where you throw the
pool/arena away at the end).

The problem with the D garbage collector is that it doesn't do 
well in
terms of either throughput or latency. Even so, GC work is 
proportional
to allocation work, so the impact on throughput in an imperative
language like D is often less than you think, even if the GC 
doesn't
have good performance. For example, I've used Tiny GC [1] in real 
world
C/C++ applications where I needed to avoid external dependencies 
and the
code still had good performance. That despite the fact that due 
its
design (a conservative GC based on system malloc()), its 
throughput
isn't particularly great.

Fun fact: Erlang uses a stop-the-world collector (or used to, I 
haven't
checked in a while), but as it is a distributed system with tons 
of
processes with small heaps, it can also keep latency low, as each
individual collection only has to traverse a limited amount of 
memory.

In order to get incremental GC with low pause times, you will 
need one
of several approaches:

-   Write barriers.
-   Read barriers.
-   Snapshot techniques.
-   Optimized reference counting.

In practice, read barriers and snapshots without hardware support 
are
often too inefficient for typical programming languages. Read 
barriers
are still used (with heavy compiler support) in Azul's GC and
Shenandoah, because (inter alia) they allow you to get around the 
last
problem for extremely low latency GC when it comes to scanning 
stacks on
a multi-threaded system.

Snapshot-based systems are rare, but there was at one point a
fork()-based snapshot collector for D1. The dual problem with 
that is
that fork() is POSIX-specific and fork() does not get along well 
with
threads. (It's possible, but a pain in the neck.)

Write barriers are usually the technique of choice, because they 
only
incur overhead when pointers are written to the heap, so they have
fairly low and predictable overhead.

Write barriers can also improve throughput. If you use 
generational
collection *and* the generational hypothesis holds (i.e. you have 
a fair
amount of short-lived objects), then the cost of the write 
barrier is
usually more than offset by the greatly reduced tracing work 
necessary.
You will need a write barrier for generational collection, 
anyway, so
you get the incremental part for (mostly) free. Write barriers 
are also
particularly attractive in functional (including mostly 
functional)
languages, where pointer writes to the heap are uncommon to begin 
with.
That said, in an imperative language you can often avoid having
short-lived heap-allocated objects, so the generational 
hypothesis may
not always apply. The most common use cases where generational GC 
might
benefit D are probably if you are doing lots of work with strings 
or are
incrementally growing a dynamic array starting from a small size.

The overhead of a write barrier is context-dependent. Probably 
the worst
case is something like sorting an array in place, where each 
pointer
swap will require two write barriers. That said, these situations 
can be
worked around, especially in an optimizing compiler that can elide
multiple write barriers for the same object, and more typical 
code will
not write pointers to the heap that much to begin with.

Basic reference counting is normally dog slow due to memory 
locality
issues. However, you can optimize the most performance degrading 
case
(assignments to local variables) in a couple of ways. You can 
optimize
away some, which is what Swift does [2]. Or you can use deferred
reference counting, which counts only references from global 
variables
and the heap, but requires occasional stack scanning. Reference 
counting
is less attractive for the multi-threaded case with shared heaps,
because atomic reference counting is even slower; competitive
implementations of deferred reference counting for Java therefore 
buffer
increments and decrements in thread-local storage and process 
them in
batches [3]. Also, additional work has to be done to deal with 
cycles.

And, of course, there are various hybrid approaches, such as
generational collection for the minor heap and reference counting 
for
the major heap.

Going back to D, the obvious low-hanging fruit is to bring its
throughput up to par with modern stop-the-world collectors. If 
your
latency requirements are much higher than what that can give you,
chances are that you are already operating in a highly specialized
domain and have very specific requirements where you'd also have 
to
invest non-trivial effort into tuning a general purpose GC.

That said, having (optional) support for write barriers in the 
compiler
would be pretty nice, though I don't know how much effort that 
would be.

[1] https://github.com/ivmai/tinygc

[2] 
https://github.com/apple/swift/blob/master/docs/ARCOptimization.rst

[3] See David Bacon's seminal paper at
https://researcher.watson.ibm.com/researcher/files/us-bacon/Bacon01Concurrent.pdf

Jan 06 2019

Walter Bright <newshound2 digitalmars.com> writes:

On 1/6/2019 12:28 PM, Reimer Behrends wrote:
 In general, a modern precise stop-the-world collector should be able to
 handle 1GB worth of pointers in something like .2 seconds on modern
 hardware,

I use Thunderbird mail, latest version, and it has random pauses that are up to 
10 seconds. It'll happen when I'm in the middle of typing, which is frustrating.

I'm not positive it is a GC pause, maybe it's something else, but the
randomness 
of it suggests it is GC.

Jan 06 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Sun, Jan 06, 2019 at 07:34:41PM -0800, Walter Bright via Digitalmars-d wrote:
 On 1/6/2019 12:28 PM, Reimer Behrends wrote:
 In general, a modern precise stop-the-world collector should be able
 to handle 1GB worth of pointers in something like .2 seconds on
 modern hardware,

 I use Thunderbird mail, latest version, and it has random pauses that
 are up to 10 seconds. It'll happen when I'm in the middle of typing,
 which is frustrating.
 
 I'm not positive it is a GC pause, maybe it's something else, but the
 randomness of it suggests it is GC.

I've experienced a similar effect in Firefox, and though I cannot say
for sure it isn't a GC problem, I notice that it causes a long spike of
intensive I/O, and appears to be correlated with occasional segfaults
and signs of memory corruption / memory leak, and generally happens only
after significant usage over a prolonged timeframe, generally ending up
in a state of extreme memory usage (several GBs in resident set size for
just a small number of persistent tabs) that reset to more reasonable
levels upon restarting and restoring exactly the same tabs.


T

-- 
What are you when you run out of Monet? Baroque.

Jan 06 2019

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Monday, 7 January 2019 at 06:52:40 UTC, H. S. Teoh wrote:
 On Sun, Jan 06, 2019 at 07:34:41PM -0800, Walter Bright via 
 Digitalmars-d wrote:
 [...]

 I've experienced a similar effect in Firefox, and though I 
 cannot say for sure it isn't a GC problem, I notice that it 
 causes a long spike of intensive I/O, and appears to be 
 correlated with occasional segfaults and signs of memory 
 corruption / memory leak, and generally happens only after 
 significant usage over a prolonged timeframe, generally ending 
 up in a state of extreme memory usage (several GBs in resident 
 set size for just a small number of persistent tabs) that reset 
 to more reasonable levels upon restarting and restoring exactly 
 the same tabs.

I have that too. It happens always on javascript heavy pages like 
facebook or youtube. Killing the process with the taskmanager is 
the quickest way to resolve the issue.

Jan 06 2019

Reimer Behrends <behrends gmail.com> writes:

On Monday, 7 January 2019 at 06:52:40 UTC, H. S. Teoh wrote:
 I've experienced a similar effect in Firefox, and though I 
 cannot say for sure it isn't a GC problem, I notice that it 
 causes a long spike of intensive I/O, and appears to be 
 correlated with occasional segfaults and signs of memory 
 corruption / memory leak, and generally happens only after 
 significant usage over a prolonged timeframe, generally ending 
 up in a state of extreme memory usage (several GBs in resident 
 set size for just a small number of persistent tabs) that reset 
 to more reasonable levels upon restarting and restoring exactly 
 the same tabs.

Firefox/Thunderbird are written mostly in C/C++/Rust. 
SpiderMonkey (the JS engine) does have a GC, but that one is 
incremental and generational and limited to JS memory. So I think 
it's unlikely that GC is the culprit.

There are any number of reasons why Firefox/Thunderbird might 
freeze, from synchronous JavaScript to inefficient or misplaced 
[1] SQLite queries.

[1] E.g. https://bugzilla.mozilla.org/show_bug.cgi?id=1266829

Jan 07 2019

welkam <wwwelkam gmail.com> writes:

On Monday, 7 January 2019 at 06:52:40 UTC, H. S. Teoh wrote:
 I've experienced a similar effect in Firefox, and though I 
 cannot say for sure it isn't a GC problem, I notice that it 
 causes a long spike of intensive I/O, and appears to be 
 correlated with occasional segfaults and signs of memory 
 corruption / memory leak, and generally happens only after 
 significant usage over a prolonged timeframe, generally ending 
 up in a state of extreme memory usage (several GBs in resident 
 set size for just a small number of persistent tabs) that reset 
 to more reasonable levels upon restarting and restoring exactly 
 the same tabs.


 T

this sound a lot like memory fragmentation
The Curse of External Fragmentation: Relocate or Bust!
http://ithare.com/the-curse-of-external-fragmentation-relocate-or-bust/

Jan 07 2019

Russel Winder <russel winder.org.uk> writes:

On Sun, 2019-01-06 at 22:52 -0800, H. S. Teoh via Digitalmars-d wrote:
 On Sun, Jan 06, 2019 at 07:34:41PM -0800, Walter Bright via Digitalmars-d
 wrote:
 On 1/6/2019 12:28 PM, Reimer Behrends wrote:
 In general, a modern precise stop-the-world collector should be able
 to handle 1GB worth of pointers in something like .2 seconds on
 modern hardware,

=20
 I use Thunderbird mail, latest version, and it has random pauses that
 are up to 10 seconds. It'll happen when I'm in the middle of typing,
 which is frustrating.
=20
 I'm not positive it is a GC pause, maybe it's something else, but the
 randomness of it suggests it is GC.

=20
 I've experienced a similar effect in Firefox, and though I cannot say
 for sure it isn't a GC problem, I notice that it causes a long spike of
 intensive I/O, and appears to be correlated with occasional segfaults
 and signs of memory corruption / memory leak, and generally happens only
 after significant usage over a prolonged timeframe, generally ending up
 in a state of extreme memory usage (several GBs in resident set size for
 just a small number of persistent tabs) that reset to more reasonable
 levels upon restarting and restoring exactly the same tabs.
=20

Some plugins to JetBrains CLion, IntelliJ IDEA, GoLand, and PyCharm (the on=
es
I use) can be prone to quite lengthy stoppages =E2=80=93 sadly the D plugin=
 1.18
suffers this. Many people blame the Java GC, usually from prejudice rather
than actual data. The Rust plugin had such a problem some weeks back, but i=
t
turned out not to be a GC problem but an exception and logging problem.

Whilst GCs can, indeed do, have problems, some people are too quick to blam=
e
the GC when actually the problems are elsewhere in most of the situations.


--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk

Jan 07 2019

Jacob Carlborg <doob me.com> writes:

On 2019-01-07 04:34, Walter Bright wrote:

 I use Thunderbird mail, latest version, and it has random pauses that 
 are up to 10 seconds. It'll happen when I'm in the middle of typing, 
 which is frustrating.
 
 I'm not positive it is a GC pause, maybe it's something else, but the 
 randomness of it suggests it is GC.

For me it only happens when downloading new emails. Which suggests 
they're not using a separate thread for that. But that would be very 
poorly implemented.

-- 
/Jacob Carlborg

Jan 07 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Jan 07, 2019 at 07:13:59PM +0000, Russel Winder via Digitalmars-d wrote:
[...]
 Whilst GCs can, indeed do, have problems, some people are too quick to
 blame the GC when actually the problems are elsewhere in most of the
 situations.

[...]

Yep, that's typical GC phobia, fueled mostly by prejudice, fear, and
ignorance, in the absence of hard evidence.  The GC is a convenient
whipping boy to blame for whatever malady you're suffering from when
you're not sure what the actual cause is, and a convenient excuse not to
use something you don't want to use for reasons you're too embarrassed
to say out loud.


T

-- 
MSDOS = MicroSoft's Denial Of Service

Jan 07 2019

Radu <void null.pt> writes:

On Saturday, 5 January 2019 at 22:05:19 UTC, Manu wrote:
I'm somewhere between a light GC user and a nogc user, and I
don't really know much about where we're at, or much about
start-of-the-art GC in general.

I read comments like this:
https://www.reddit.com/r/programming/comments/acsg61/version_20840_of_dmd_the_d_reference_compiler_has/edbhgzi

"""
And so we do nothing and D continues to languish as every
thread is
filled with discussion about a poor implementation of a sorely
outdated GC? Other languages in this domain get by just fine -
Go and
Nim to name some off the top of my head (with the full
disclosure that
the former isn't quite D performance and Nim uses a
thread-local GC).
There are C / C++ libraries that provide a better GC. D's
tradeoff is
basically having one of the worst GCs of any language I've used
this
century to avoid a write barrier that causes a, what, 5%
nigh-deterministic overall reduction in performance (assuming
we use
fast barriers out of a cycle).

And if a reasonably performant GC is fundamentally incompatible
and
you think it will always be this way, then maybe D as a
language needs
to make serious re-evaluations? D keeps saying "Don't Fear the
Reaper", but D is one of the only languages where I actually do.
"""

How much truth is in here?
What is this business about write barriers making the GC fast?
Is that
actually fact, or are there other hurdles?

Is the claim that write-barriers generally slow your runtime
performance significant? And if it is, is it
possible/reasonable to
control the places where they are emit?
Can nogc be used to inhibit write barriers in code that we know
doesn't interact with the GC, such that we have control over
that loss
of perf?

It's obvious that the quality of the GC implementation is a
mental barrier to entry for many... and D has a GC, which is
*attractive* to some users. Despite the fact that I don't care
about the GC much personally, we do need to care about this as
a group, and nobody seems to be making substantial progress.

Is progress possible, or is the hard reality that the language
is just designed such to be resistant to a quality GC, while
the ecosystem sadly tends to rely on it?

Where's the ARC stuff? What happened to opAddRef/opRelease?

There is this article [1] which covers some of the details and
opportunities to get the D GC in a better shape.

For a while improvements on GC front were slow, I see that lately
Rainer has submitted some PRs [2] for getting various GC issues
sorted. My favorite one is the new precise GC [3].

I think a big improvement would be the use of thread local heaps
and GC. But this requires a rethink of `shared`, which would be
nice to be done anyhow.

1 -
https://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html
2 -
https://github.com/dlang/druntime/commits?author=rainers&since=2018-12-1&until=2019-01-07
3 - https://github.com/dlang/druntime/pull/2418

Jan 07 2019

welkam <wwwelkam gmail.com> writes:

On Saturday, 5 January 2019 at 22:05:19 UTC, Manu wrote:
 poor implementation of a sorely outdated GC?

The whole r/programming needs to be garbage collected. I cant 
find a source but if I recall correctly mark and sweep GC has the 
highest program throughput of all GCs.

 What is this business about write barriers making the GC fast?

In Java and other managed languages common code has lots of small 
short lived objects and not a lot of pointer modifications so 
adding write barriers can get you concurrent GC for almost free. 
For a programmer like you that can turn heap allocations to 
stack, use scope(exit) or similar for simple allocations and 
leave difficult allocations to GC you wont benefit from them.


 It's obvious that the quality of the GC implementation is a 
 mental barrier to entry for many...<...> we do need to care 
 about this as a group, and nobody seems to be making 
 substantial progress.

For things to change we need to change minds. I have watched many 
debates in religion, climate change and politics and saw just how 
hard it is to change some people opinions. The best hope is to 
target people in the middle that dont have strong opinions one 
way or another and repeatedly expose to information that shows 
that GC is not a problem.

Its not correct to say that people in this group dont care its 
just that everything needs work and this is not the only problem 
that D have. I have idea for D blog that would some what address 
this problem but I know myself and I know that there is low 
chance for this idea to materialize. I even picked title - "Why D 
with GC is faster than your C/C++ code" TL;DR memory management

 Is progress possible, or is the hard reality that the language 
 is just designed such to be resistant to a quality GC

With dip1000 in principle we can reduce overhead of GC by 
replacing GC allocations to stack or malloc making generational 
GC necessary.

Jan 07 2019

JN <666total wp.pl> writes:

On Monday, 7 January 2019 at 15:54:32 UTC, welkam wrote:
 For things to change we need to change minds. I have watched 
 many debates in religion, climate change and politics and saw 
 just how hard it is to change some people opinions. The best 
 hope is to target people in the middle that dont have strong 
 opinions one way or another and repeatedly expose to 
 information that shows that GC is not a problem.

I think for many people (especially C++ folks which seem to be 
the main target of D leadership) the mere existence of GC is 
enough to turn them off. Here's a quote I found on Rust subreddit 
today:

"I remember getting excited about D in 2002 exactly because it 
was supposed to be 'C++-improved' and then learning that D had GC 
which made it less 'C++-improved' and more 'Java-improved'. I 
switched to Rust exactly because I find it 'C++-improved'."

The mere existence of GC instantly puts D into application 
programming language rather than systems programming language for 
most people. Are they correct? Debatable. However the mental 
barrier is there and it's hard to get through with a nice message.

Jan 07 2019

welkam <wwwelkam gmail.com> writes:

On Monday, 7 January 2019 at 17:18:49 UTC, JN wrote:
 On Monday, 7 January 2019 at 15:54:32 UTC, welkam wrote:
 For things to change we need to change minds. I have watched 
 many debates in religion, climate change and politics and saw 
 just how hard it is to change some people opinions. The best 
 hope is to target people in the middle that dont have strong 
 opinions one way or another and repeatedly expose to 
 information that shows that GC is not a problem.

 I think for many people (especially C++ folks which seem to be 
 the main target of D leadership) the mere existence of GC is 
 enough to turn them off. Here's a quote I found on Rust 
 subreddit today:

 "I remember getting excited about D in 2002 exactly because it 
 was supposed to be 'C++-improved' and then learning that D had 
 GC which made it less 'C++-improved' and more 'Java-improved'. 
 I switched to Rust exactly because I find it 'C++-improved'."

 The mere existence of GC instantly puts D into application 
 programming language rather than systems programming language 
 for most people. Are they correct? Debatable. However the 
 mental barrier is there and it's hard to get through with a 
 nice message.

Your sampling is bad. You pay attention to those who commented 
and not the silent majority. People who dont have strong opinions 
and open minded wont comment with such strong opinion on reddit 
threads discoursing language merits. They just read them and 
thats the people you can convince with solid arguments.

Jan 07 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Jan 07, 2019 at 05:18:49PM +0000, JN via Digitalmars-d wrote:
[...]
 I think for many people (especially C++ folks which seem to be the
 main target of D leadership) the mere existence of GC is enough to
 turn them off.

Yep. GC phobia is a common malady among C/C++ folks (I used to be one of
them).  I wrote D off the first time I found it because I was suffering
from a particularly acute case of GC phobia, mostly colored by the bad
impressions of the early versions of Java.

It took me a while to come around.  Reading this helped a little:

	https://dlang.org/spec/garbage.html

but it didn't convince me, only made me cautiously willing to at least
try out D a little to see for myself.  I think I wasn't even willing to
try D until I read that I could optionally still use malloc/free if I
really wanted to, or if the GC turned out to be a problem.  Clutching
onto that option was an important crutch for my initial foray into D.

It wasn't until I actually wrote code that let the GC do its job, that I
was finally able to see, in retrospect, just how much time I wasted
basically reimplementing my own version of GC (manually, and rather
poorly!) and debugging the associated tricky pointer code -- all of
which was unnecessary when there's a GC.

Furthermore, having a GC enabled a cleanness in my internal APIs that
manually-managed memory did not allow -- it eliminated memory management
from being an integral aspect of my APIs and freed up the design space
(and my mental concentration!) to actually focus on the problem domain
instead of being bogged down with memory management details.

The result was:

(1) my code quality was better from having eliminated an entire class of
pointer bugs and from my being able to focus more on the problem domain
rather than fiddle with memory management at every turn; and

(2) it had better API that directly addressed the problem domain,
uncluttered by dirty memory management specifics;

(3) I could achieve the above in a much shorter time (and with much less
effort) than without the GC, since I eliminated the big time sink of
manual memory management.

This was a clear win for me.  So eventually I convinced myself. :-P


The other important thing I had to get over was the oft-cited
performance concerns -- which in my case (and I surmise in the case of
many others in the C/C++ camp) was colored by the bad impressions of
early versions of Java.  It took me a while to convince myself that the
chains of `delete`s in the C++ dtor of a large recursive structure
basically was tantamount to a GC pause, just scheduled differently.
Well, OK, it's not *exactly* the same thing, but it dispelled the wrong
notion that malloc/free (or C++ new/delete) came "for free".  You still
have to do roughly the same amount of "work" or bookkeeping either way.

Of course, the cost of a GC collection is likely more than manually
freeing stuff, depending on the situation; but there's the tradeoff of
spending easily 3x to 4x more effort in writing correct pointer code
under manual MM and *still* running the risk of pointer bugs (with the
associated debugging effort required to fix them -- and pointer bugs are
notorious for being hard to debug), vs. paying for a (usually) small
performance cost and being much more productive, with the benefit of
eliminating a whole class of pointer bugs "for free".


T

-- 
Study gravitation, it's a field with a lot of potential.

Jan 07 2019

welkam <wwwelkam gmail.com> writes:

On Monday, 7 January 2019 at 18:18:05 UTC, H. S. Teoh wrote:
 but there's the tradeoff of spending easily 3x to 4x more 
 effort in writing correct pointer code under manual MM

Most code is slow because programmer didn't had time to optimize. 
For normal code GC wont add more than 15% overhead but with some 
optimization you can easily get over 2x improvement in your 
program.
This could be excellent topic for D blog.

Jan 07 2019

Guillaume Piolat <first.last gmail.com> writes:

On Monday, 7 January 2019 at 18:18:05 UTC, H. S. Teoh wrote:
 Yep. GC phobia is a common malady among C/C++ folks (I used to 
 be one of them).

Guilty as charge here!

What I like most about GC is the efficiency! With scanning and a 
global owner, you don't need to keep ownership information 
anywhere. Which leads to less copying (std::string leads malloc 
for every string copy, GC avoids that), slices being two machine 
words instead of 3, etc.

Jan 07 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Jan 07, 2019 at 10:02:02PM +0000, Guillaume Piolat via Digitalmars-d
wrote:
 On Monday, 7 January 2019 at 18:18:05 UTC, H. S. Teoh wrote:
 Yep. GC phobia is a common malady among C/C++ folks (I used to be
 one of them).

 
 Guilty as charge here!

I wonder if some of us ex-GC-phobists(?) should throw together a D blog
entry with brief summaries / excerpts of how we got over our GC phobia
and came to embrace the GC.  Could be a useful way to collect some
common GC myths / arguments for GC in a place where we can point people
to.


 What I like most about GC is the efficiency! With scanning and a
 global owner, you don't need to keep ownership information anywhere.

That's true!

And also, having a dedicated collection thread also means you get better
cache coherency and economy of scale, as opposed to individually freeing
small objects here and there and incurring many RAM roundtrips.


 Which leads to less copying (std::string leads malloc for every string
 copy, GC avoids that), slices being two machine words instead of 3,
 etc.

Yeah, std::string copy-on-assign and copy-on-substring do add a lot of
overhead that people are often unaware of.  Well, some people *are*
aware of it, but the "solution" many adopt is to avoid std::* and go
back to the bad ole world of char* and buffer overrun heaven.  Both are
pessimal.  D's slices r0x0rs.


T

-- 
Too many people have open minds but closed eyes.

Jan 07 2019

Russel Winder <russel winder.org.uk> writes:

On Mon, 2019-01-07 at 14:29 -0800, H. S. Teoh via Digitalmars-d wrote:
 [=E2=80=A6]
=20
 I wonder if some of us ex-GC-phobists(?) should throw together a D blog
 entry with brief summaries / excerpts of how we got over our GC phobia
 and came to embrace the GC.  Could be a useful way to collect some
 common GC myths / arguments for GC in a place where we can point people
 to.

Or create an article for Overload or CVu which can then be a blog post on t=
he
D website somewhere.



--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Dr Russel Winder      t: +44 20 7585 2200
41 Buckmaster Road    m: +44 7770 465 077
London SW11 1EN, UK   w: www.russel.org.uk

Jan 07 2019

D Programming

C/C++ Programming

Other

digitalmars.D - What's the go with the GC these days?