digitalmars.D.learn - low-latency GC

Bruce Carneal (7/7) Dec 05 2020 How difficult would it be to add a, selectable, low-latency GC to

Ola Fosheim Grostad (2/9) Dec 05 2020 The only reasonable option for D is single threaded GC or ARC.

Bruce Carneal (6/17) Dec 05 2020 OK. Some rationale? Do you, for example, believe that

Ola Fosheim Grostad (4/8) Dec 05 2020 The GC needs to scan all the affected call stacks before it can

Bruce Carneal (15/23) Dec 05 2020 GCs scan memory, sure. Lots of variations. Not germane. Not a

Ola Fosheim Grostad (11/24) Dec 06 2020 Yes, but they don't allow low level programming. Go also freeze

Bruce Carneal (11/39) Dec 06 2020 OK. Low latency GCs exist.

Ola Fosheim Grostad (5/13) Dec 06 2020 Well, you could in theory avoid putting owning pointers on the

Ola Fosheim Grostad (4/8) Dec 06 2020 Abd read barriers... I assume. However with single threaded
Bruce Carneal (3/17) Dec 06 2020 'shared' with teeth?

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (8/13) Dec 06 2020 It was more a hypothetical, as read barriers are too expensive.

IGotD- (24/31) Dec 06 2020 In kernel programming there are plenty of atomic reference

Ola Fosheim Grostad (9/25) Dec 06 2020 I am not sure if kernel authors want autmatic memory management,

Paulo Pinto (18/36) Dec 06 2020 They surely do.

Ola Fosheim Grostad (11/19) Dec 06 2020 Didnt say anything about low level, only simd intrinsics, which

Bruce Carneal (12/25) Dec 06 2020 So you must make the familiar "ease-of-programming" vs "x% of

Ola Fosheim Grostad (11/18) Dec 06 2020 My impression from reading the forums is that people either use D

Max Haughton (11/22) Dec 06 2020 It has to be either some kind of heavily customisable small GC

Ola Fosheim Grostad (8/19) Dec 06 2020 ARC can be done incrementally, we can do it as a library first

Max Haughton (12/33) Dec 06 2020 ARC with a library will have overhead unless the compiler/ABI is

Ola Fosheim Grostad (10/15) Dec 06 2020 No, unique doesnt need indirection, neither does ARC, we put the

Max Haughton (3/15) Dec 06 2020 https://gcc.godbolt.org/z/bnbMeY

Ola Fosheim Grostad (4/24) Dec 06 2020 If you pass something as a parameter then there may or may not be

IGotD- (16/23) Dec 06 2020 The Rust approach is interesting as it doesn't need an ARC

Ola Fosheim Grostad (3/10) Dec 06 2020 I dont know, but I suspect that people that use D want something

oddp (26/34) Dec 08 2020 What our closest competition, Nim, is up to with their mark-and-sweep re...
oddp (29/37) Dec 08 2020 What our closest competition, Nim, is up to with their mark-and-sweep re...

Bruce Carneal <bcarneal gmail.com> writes:

How difficult would it be to add a, selectable, low-latency GC to 
dlang?

Is it closer to "we cant get there from here" or "no big deal if 
you already have the low-latency GC in hand"?

I've heard Walter mention performance issues (write barriers 
IIRC).  I'm also interested in the GC-flavor performance trade 
offs but here I'm just asking about feasibility.

Dec 05 2020

Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:

On Sunday, 6 December 2020 at 05:16:26 UTC, Bruce Carneal wrote:
 How difficult would it be to add a, selectable, low-latency GC 
 to dlang?

 Is it closer to "we cant get there from here" or "no big deal 
 if you already have the low-latency GC in hand"?

 I've heard Walter mention performance issues (write barriers 
 IIRC).  I'm also interested in the GC-flavor performance trade 
 offs but here I'm just asking about feasibility.

The only reasonable option for D is single threaded GC or ARC.

Dec 05 2020

Bruce Carneal <bcarneal gmail.com> writes:

On Sunday, 6 December 2020 at 05:29:37 UTC, Ola Fosheim Grostad 
wrote:
 On Sunday, 6 December 2020 at 05:16:26 UTC, Bruce Carneal wrote:
 How difficult would it be to add a, selectable, low-latency GC 
 to dlang?

 Is it closer to "we cant get there from here" or "no big deal 
 if you already have the low-latency GC in hand"?

 I've heard Walter mention performance issues (write barriers 
 IIRC).  I'm also interested in the GC-flavor performance trade 
 offs but here I'm just asking about feasibility.

 The only reasonable option for D is single threaded GC or ARC.

OK.  Some rationale?  Do you, for example, believe that 
no-probable-dlanger could benefit from a low-latency GC?  That it 
is too hard to implement?  That the language is somehow 
incompatible? That ...

Dec 05 2020

Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:

On Sunday, 6 December 2020 at 05:41:05 UTC, Bruce Carneal wrote:
 OK.  Some rationale?  Do you, for example, believe that 
 no-probable-dlanger could benefit from a low-latency GC?  That 
 it is too hard to implement?  That the language is somehow 
 incompatible? That ...

The GC needs to scan all the affected call stacks before it can 
do incremental collection. Multi threaded GC is generally not 
compatible with low level programming.

Dec 05 2020

Bruce Carneal <bcarneal gmail.com> writes:

On Sunday, 6 December 2020 at 06:52:41 UTC, Ola Fosheim Grostad 
wrote:
 On Sunday, 6 December 2020 at 05:41:05 UTC, Bruce Carneal wrote:
 OK.  Some rationale?  Do you, for example, believe that 
 no-probable-dlanger could benefit from a low-latency GC?  That 
 it is too hard to implement?  That the language is somehow 
 incompatible? That ...

 The GC needs to scan all the affected call stacks before it can 
 do incremental collection. Multi threaded GC is generally not 
 compatible with low level programming.

GCs scan memory, sure.  Lots of variations.  Not germane.  Not a 
rationale.

D is employed at multiple "levels".  Whatever level you call it, 
Go and modern JVMs employ low latency GCs in multi-threaded 
environments.  Some people would like to use D at that "level".

My question remains: how difficult would it be to bring such 
technology to D as a GC option?  Is it precluded somehow by the 
language?   Is it doable but quite a lot of effort because ...?  
Is it no big deal once you have the GC itself because you only 
need xyz hooks? Is it ...?

Also, I think Walter may have been concerned about read barrier 
overhead but, again, I'm looking for feasibility information.  
What would it take to get something that we could compare?

Dec 05 2020

Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:

On Sunday, 6 December 2020 at 07:45:17 UTC, Bruce Carneal wrote:
 GCs scan memory, sure.  Lots of variations.  Not germane.  Not 
 a rationale.

We need to freeze the threads when collecting stacks/globals.

 D is employed at multiple "levels".  Whatever level you call 
 it, Go and modern JVMs employ low latency GCs in multi-threaded 
 environments.  Some people would like to use D at that "level".

Yes, but they don't allow low level programming. Go also freeze 
to sync threads this has a rather profound impact on code 
generation. They have spent a lot of effort on  sync instructions 
in code gen to lower the latency AFAIK.

 My question remains: how difficult would it be to bring such 
 technology to D as a GC option?  Is it precluded somehow by the 
 language?   Is it doable but quite a lot of effort because ...?
  Is it no big deal once you have the GC itself because you only 
 need xyz hooks? Is it ...?

Get rid of the system stack and globals. Use only closures and 
put in a restrictive memory model. Then maybe you can get a fully 
no freeze multi threaded GC.  That would be a different language.

 Also, I think Walter may have been concerned about read barrier 
 overhead but, again, I'm looking for feasibility information.  
 What would it take to get something that we could compare?

Just add ARC + single threaded GC. And even that is quite 
expensive.

Dec 06 2020

Bruce Carneal <bcarneal gmail.com> writes:

On Sunday, 6 December 2020 at 08:12:58 UTC, Ola Fosheim Grostad 
wrote:
 On Sunday, 6 December 2020 at 07:45:17 UTC, Bruce Carneal wrote:
 GCs scan memory, sure.  Lots of variations.  Not germane.  Not 
 a rationale.

 We need to freeze the threads when collecting stacks/globals.

OK.  Low latency GCs exist.

 D is employed at multiple "levels".  Whatever level you call 
 it, Go and modern JVMs employ low latency GCs in 
 multi-threaded environments.  Some people would like to use D 
 at that "level".

 Yes, but they don't allow low level programming. Go also freeze 
 to sync threads this has a rather profound impact on code 
 generation. They have spent a lot of effort on  sync 
 instructions in code gen to lower the latency AFAIK.

So, much of the difficulty in bringing low-latency GC to dlang 
would be the large code gen changes required.  If it is a really 
big effort then that is all we need to know.  Not worth it until 
we can see a big payoff and have more resources.

 My question remains: how difficult would it be to bring such 
 technology to D as a GC option?  Is it precluded somehow by 
 the language?   Is it doable but quite a lot of effort because 
 ...?
  Is it no big deal once you have the GC itself because you 
 only need xyz hooks? Is it ...?

 Get rid of the system stack and globals. Use only closures and 
 put in a restrictive memory model. Then maybe you can get a 
 fully no freeze multi threaded GC.  That would be a different 
 language.

It would be, but I don't think it is the only way to get lower 
latency GC.  That said, if the code gen effort you mentioned 
earlier is a big deal, then no need to speculate/examine further.

 Also, I think Walter may have been concerned about read 
 barrier overhead but, again, I'm looking for feasibility 
 information.  What would it take to get something that we 
 could compare?

 Just add ARC + single threaded GC. And even that is quite 
 expensive.

Thanks for the feedback.

Dec 06 2020

Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:

On Sunday, 6 December 2020 at 08:36:49 UTC, Bruce Carneal wrote:
 Yes, but they don't allow low level programming. Go also 
 freeze to sync threads this has a rather profound impact on 
 code generation. They have spent a lot of effort on  sync 
 instructions in code gen to lower the latency AFAIK.

 So, much of the difficulty in bringing low-latency GC to dlang 
 would be the large code gen changes required.  If it is a 
 really big effort then that is all we need to know.  Not worth 
 it until we can see a big payoff and have more resources.

Well, you could in theory avoid putting owning pointers on the 
stack/globals or require that they are registered as gc roots. 
Then you don't have to scan the stack. All you need then is write 
barriers. IIRC

Dec 06 2020

Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:

On Sunday, 6 December 2020 at 08:59:49 UTC, Ola Fosheim Grostad 
wrote:
 Well, you could in theory avoid putting owning pointers on the 
 stack/globals or require that they are registered as gc roots. 
 Then you don't have to scan the stack. All you need then is 
 write barriers. IIRC

Abd read barriers... I assume. However with single threaded 
incremental, write barriers should be enough.

Dec 06 2020

Bruce Carneal <bcarneal gmail.com> writes:

On Sunday, 6 December 2020 at 08:59:49 UTC, Ola Fosheim Grostad 
wrote:
 On Sunday, 6 December 2020 at 08:36:49 UTC, Bruce Carneal wrote:
 Yes, but they don't allow low level programming. Go also 
 freeze to sync threads this has a rather profound impact on 
 code generation. They have spent a lot of effort on  sync 
 instructions in code gen to lower the latency AFAIK.

 So, much of the difficulty in bringing low-latency GC to dlang 
 would be the large code gen changes required.  If it is a 
 really big effort then that is all we need to know.  Not worth 
 it until we can see a big payoff and have more resources.

 Well, you could in theory avoid putting owning pointers on the 
 stack/globals or require that they are registered as gc roots. 
 Then you don't have to scan the stack. All you need then is 
 write barriers. IIRC

'shared' with teeth?

Dec 06 2020

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Sunday, 6 December 2020 at 14:45:21 UTC, Bruce Carneal wrote:
 Well, you could in theory avoid putting owning pointers on the 
 stack/globals or require that they are registered as gc roots. 
 Then you don't have to scan the stack. All you need then is 
 write barriers. IIRC

 'shared' with teeth?

It was more a hypothetical, as read barriers are too expensive. 
But write barriers should be ok, so a single-threaded incremental 
collector could work well if D takes a principled stance on 
objects not being 'shared' not being handed over to other threads 
without pinning them in the GC.

Maybe a better option for D than ARC, as it is closer to what 
people are used to.

Dec 06 2020

IGotD- <nise nise.com> writes:

On Sunday, 6 December 2020 at 15:44:32 UTC, Ola Fosheim Grøstad 
wrote:
 It was more a hypothetical, as read barriers are too expensive. 
 But write barriers should be ok, so a single-threaded 
 incremental collector could work well if D takes a principled 
 stance on objects not being 'shared' not being handed over to 
 other threads without pinning them in the GC.

 Maybe a better option for D than ARC, as it is closer to what 
 people are used to.

In kernel programming there are plenty of atomic reference 
counted objects. The reason is that is you have kernel that 
supports SMP you must have it because you don't really know which 
CPU is working with a structure at any given time. These are 
often manually reference counted objects, which can lead to 
memory leaking bugs but they are not that hard to find.

Is automatic atomic reference counting a contender for kernels? 
In kernels you want to reduce the increase/decrease of the 
counts. Therefore the Rust approach using 'clone' is better 
unless there is some optimizer that can figure it out. 
Performance is important in kernels, you don't want the kernel to 
steal useful CPU time that otherwise should go to programs.

In general I think that reference counting should be supported in 
D, not only implicitly but also under the hood with fat pointers. 
This will make D more attractive to performance applications. 
Another advantage is the reference counting can use malloc/free 
directly if needed without any complicated GC layer with 
associated meta data.

Also tracing GC in a kernel is my opinion not desirable. For the 
reason I previously mentioned, you want to reduce meta data, you 
want reduce CPU time, you want to reduce fragmentation. Special 
allocators for structures are often used.

Dec 06 2020

Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:

On Sunday, 6 December 2020 at 17:35:19 UTC, IGotD- wrote:
 Is automatic atomic reference counting a contender for kernels? 
 In kernels you want to reduce the increase/decrease of the 
 counts. Therefore the Rust approach using 'clone' is better 
 unless there is some optimizer that can figure it out. 
 Performance is important in kernels, you don't want the kernel 
 to steal useful CPU time that otherwise should go to programs.

I am not sure if kernel authors want autmatic memory management, 
they tend to want full control and transparency. Maybe something 
people who write device drivers would consider.

 In general I think that reference counting should be supported 
 in D, not only implicitly but also under the hood with fat 
 pointers. This will make D more attractive to performance 
 applications. Another advantage is the reference counting can 
 use malloc/free directly if needed without any complicated GC 
 layer with associated meta data.

Yes, I would like to see it, just expect that there will be 
protests when people realize that they have to make ownership 
explicit.

 Also tracing GC in a kernel is my opinion not desirable. For 
 the reason I previously mentioned, you want to reduce meta 
 data, you want reduce CPU time, you want to reduce 
 fragmentation. Special allocators for structures are often used.

Yes, an ARC solution should support fixed size allocators for 
types that are frequently allocated to get better speed.

Dec 06 2020

Paulo Pinto <pjmlp progtools.org> writes:

On Sunday, 6 December 2020 at 08:12:58 UTC, Ola Fosheim Grostad 
wrote:
 On Sunday, 6 December 2020 at 07:45:17 UTC, Bruce Carneal wrote:
 GCs scan memory, sure.  Lots of variations.  Not germane.  Not 
 a rationale.

 We need to freeze the threads when collecting stacks/globals.

 D is employed at multiple "levels".  Whatever level you call 
 it, Go and modern JVMs employ low latency GCs in 
 multi-threaded environments.  Some people would like to use D 
 at that "level".

 Yes, but they don't allow low level programming. Go also freeze 
 to sync threads this has a rather profound impact on code 
 generation. They have spent a lot of effort on  sync 
 instructions in code gen to lower the latency AFAIK.

They surely do.

Looking forward to see D achieve the same performance level as 
.NET 5 is capable of, beating Google's own gRPC C++ 
implementation, only Rust implementation beats it.

https://www.infoq.com/news/2020/12/aspnet-core-improvement-dotnet-5/

And while on the subject of low level programming in JVM or .NET.

https://www.infoq.com/news/2020/12/net-5-runtime-improvements/

 Many of the performance improvements in the HTTP/2 
 implementation are related to the reimplementation from 

 "still is this kind of idea that managed languages are not 
 quite up to the task for some of those low-level super 
 performance sensitive components,

Rich Lander being one of the main .NET architects, and upcoming 
Java 16 features, http://openjdk.java.net/jeps/389 (JNI 
replacement), http://openjdk.java.net/jeps/393 (native memory 
management).

As I already mentioned in another thread, rebooting the language 
to pull in imaginary crowds will only do more damage than good, 
while the ones deemed unusable by the same imaginary crowd just 
keep winning market share, slowly and steady, even if takes yet 
another couple of years.

Dec 06 2020

Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:

On Sunday, 6 December 2020 at 14:44:25 UTC, Paulo Pinto wrote:
 And while on the subject of low level programming in JVM or 
 .NET.

 https://www.infoq.com/news/2020/12/net-5-runtime-improvements/

Didnt say anything about low level, only simd intrinsics, which 
isnt really low level?

It also stated "When it came to something that is pure CPU raw 
computation doing nothing but number crunching, in general, you 
can still eke out better performance if you really focus on 
"pedal to the metal" with your C/C++ code."

So it is more of a Go contender, and Go is not a systems level 
language... Apples and oranges.

 As I already mentioned in another thread, rebooting the 
 language to pull in imaginary crowds will only do more damage 
 than good, while the ones deemed unusable by the same imaginary 
 crowd just keep winning market share, slowly and steady, even 
 if takes yet another couple of years.

A fair number of people here are in that imaginary crowd.
So, I guess it isnt imaginary...

Dec 06 2020

Bruce Carneal <bcarneal gmail.com> writes:

On Sunday, 6 December 2020 at 16:42:00 UTC, Ola Fosheim Grostad 
wrote:
 On Sunday, 6 December 2020 at 14:44:25 UTC, Paulo Pinto wrote:
 And while on the subject of low level programming in JVM or 
 .NET.

 https://www.infoq.com/news/2020/12/net-5-runtime-improvements/

 Didnt say anything about low level, only simd intrinsics, which 
 isnt really low level?

 It also stated "When it came to something that is pure CPU raw 
 computation doing nothing but number crunching, in general, you 
 can still eke out better performance if you really focus on 
 "pedal to the metal" with your C/C++ code."

So you must make the familiar "ease-of-programming" vs "x% of 
performance" choice, where 'x' is presumably much smaller than 
earlier.

 So it is more of a Go contender, and Go is not a systems level 
 language... Apples and oranges.

D is good for systems level work but that's not all.  I use it 
for projects where, in the past, I'd have split the work between 
two languages (Python and C/C++).  I much prefer working with a 
single language that spans the problem space.

If there is a way to extend D's reach with zero or a near-zero 
complexity increase as seen by the programmer, I believe we 
should (as/when resources allow of course).

Dec 06 2020

Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:

On Sunday, 6 December 2020 at 17:28:52 UTC, Bruce Carneal wrote:
 D is good for systems level work but that's not all.  I use it 
 for projects where, in the past, I'd have split the work 
 between two languages (Python and C/C++).  I much prefer 
 working with a single language that spans the problem space.

My impression from reading the forums is that people either use D 
as a replacement for C/C++ or Python/numpy, so I think your 
experience covers the essential use case scenario that is 
dominating current D usage? Any improvements have to improve both 
dimension, I agree.

 If there is a way to extend D's reach with zero or a near-zero 
 complexity increase as seen by the programmer, I believe we 
 should (as/when resources allow of course).

ARC involves a complexity increase, to some extent. Library 
authors have to think a bit more principled about when objects 
should be phased out and destructed, which I think tend to lead 
to better programs. It would also allow for faster precise 
collection. So it could be beneficial for all.

Dec 06 2020

Max Haughton <maxhaton gmail.com> writes:

On Sunday, 6 December 2020 at 05:29:37 UTC, Ola Fosheim Grostad 
wrote:
 On Sunday, 6 December 2020 at 05:16:26 UTC, Bruce Carneal wrote:
 How difficult would it be to add a, selectable, low-latency GC 
 to dlang?

 Is it closer to "we cant get there from here" or "no big deal 
 if you already have the low-latency GC in hand"?

 I've heard Walter mention performance issues (write barriers 
 IIRC).  I'm also interested in the GC-flavor performance trade 
 offs but here I'm just asking about feasibility.

 The only reasonable option for D is single threaded GC or ARC.

It has to be either some kind of heavily customisable small GC 
(i.e. with our resources the GC cannot please everyone), or arc. 
The GC as it is just hurts the language.

Realistically, we probably need some kind of working group or at 
least serious discussion to really narrow down where to go in the 
future. The GC as it is now must go, we need borrowing to work 
with more than just pointers, etc.

The issue is that it can't just be done incrementally, it needs 
to be specified beforehand.

Dec 06 2020

Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:

On Sunday, 6 December 2020 at 10:44:39 UTC, Max Haughton wrote:
 On Sunday, 6 December 2020 at 05:29:37 UTC, Ola Fosheim Grostad 
 wrote:
 It has to be either some kind of heavily customisable small GC 
 (i.e. with our resources the GC cannot please everyone), or 
 arc. The GC as it is just hurts the language.

 Realistically, we probably need some kind of working group or 
 at least serious discussion to really narrow down where to go 
 in the future. The GC as it is now must go, we need borrowing 
 to work with more than just pointers, etc.

 The issue is that it can't just be done incrementally, it needs 
 to be specified beforehand.

ARC can be done incrementally, we can do it as a library first 
and use a modified version existing GC for detecting failed 
borrows at runtime during testing.

But all libraries that use owning pointers need ownership to be 
made explicit.

A static borrow checker an ARC optimizer needs a high level IR 
though. A lot of work though.

Dec 06 2020

Max Haughton <maxhaton gmail.com> writes:

On Sunday, 6 December 2020 at 11:07:50 UTC, Ola Fosheim Grostad 
wrote:
 On Sunday, 6 December 2020 at 10:44:39 UTC, Max Haughton wrote:
 On Sunday, 6 December 2020 at 05:29:37 UTC, Ola Fosheim 
 Grostad wrote:
 It has to be either some kind of heavily customisable small GC 
 (i.e. with our resources the GC cannot please everyone), or 
 arc. The GC as it is just hurts the language.

 Realistically, we probably need some kind of working group or 
 at least serious discussion to really narrow down where to go 
 in the future. The GC as it is now must go, we need borrowing 
 to work with more than just pointers, etc.

 The issue is that it can't just be done incrementally, it 
 needs to be specified beforehand.

 ARC can be done incrementally, we can do it as a library first 
 and use a modified version existing GC for detecting failed 
 borrows at runtime during testing.

 But all libraries that use owning pointers need ownership to be 
 made explicit.

 A static borrow checker an ARC optimizer needs a high level IR 
 though. A lot of work though.

ARC with a library will have overhead unless the compiler/ABI is 
changed e.g. unique_ptr in C++ has an indirection.

The AST effectively is a high-level IR. Not a good one, but good 
enough. The system Walter has built shows the means are there in 
the compiler already.

As things are at the moment, the annotations we have for pointers 
like scope go a long way, but the language doesn't deal with 
things like borrowing structs (and the contents of structs i.e. 
making a safe vector) properly yet. That is what needs thinking 
about.

Dec 06 2020

Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:

On Sunday, 6 December 2020 at 11:27:39 UTC, Max Haughton wrote:
 ARC with a library will have overhead unless the compiler/ABI 
 is changed e.g. unique_ptr in C++ has an indirection.

No, unique doesnt need indirection, neither does ARC, we put the 
ref count at a negative offset.

shared_ptr is a fat pointer with the ref count as a separate 
object to support existing C libraries, and make weak_ptr easy to 
implement. But no need for indirection.

 The AST effectively is a high-level IR. Not a good one, but 
 good enough. The system Walter has built shows the means are 
 there in the compiler already.

I think you need a new IR, but it does not have to be used for 
code gen, it can point back to the ast nodes that represent ARC 
pointer assignments.

One could probably translate the one used in Rust, even.

Dec 06 2020

Max Haughton <maxhaton gmail.com> writes:

On Sunday, 6 December 2020 at 11:35:17 UTC, Ola Fosheim Grostad 
wrote:
 On Sunday, 6 December 2020 at 11:27:39 UTC, Max Haughton wrote:
 [...]

 No, unique doesnt need indirection, neither does ARC, we put 
 the ref count at a negative offset.

 shared_ptr is a fat pointer with the ref count as a separate 
 object to support existing C libraries, and make weak_ptr easy 
 to implement. But no need for indirection.

 [...]

 I think you need a new IR, but it does not have to be used for 
 code gen, it can point back to the ast nodes that represent ARC 
 pointer assignments.

 One could probably translate the one used in Rust, even.

https://gcc.godbolt.org/z/bnbMeY

Dec 06 2020

Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:

On Sunday, 6 December 2020 at 14:11:41 UTC, Max Haughton wrote:
 On Sunday, 6 December 2020 at 11:35:17 UTC, Ola Fosheim Grostad 
 wrote:
 On Sunday, 6 December 2020 at 11:27:39 UTC, Max Haughton wrote:
 [...]

 No, unique doesnt need indirection, neither does ARC, we put 
 the ref count at a negative offset.

 shared_ptr is a fat pointer with the ref count as a separate 
 object to support existing C libraries, and make weak_ptr easy 
 to implement. But no need for indirection.

 [...]

 I think you need a new IR, but it does not have to be used for 
 code gen, it can point back to the ast nodes that represent 
 ARC pointer assignments.

 One could probably translate the one used in Rust, even.

 https://gcc.godbolt.org/z/bnbMeY

If you pass something as a parameter then there may or may not be 
an extra reference involved. Not specific for smart pointers, but 
ARC optimization should take care of that.

Dec 06 2020

IGotD- <nise nise.com> writes:

On Sunday, 6 December 2020 at 11:07:50 UTC, Ola Fosheim Grostad 
wrote:
 ARC can be done incrementally, we can do it as a library first 
 and use a modified version existing GC for detecting failed 
 borrows at runtime during testing.

 But all libraries that use owning pointers need ownership to be 
 made explicit.

 A static borrow checker an ARC optimizer needs a high level IR 
 though. A lot of work though.

The Rust approach is interesting as it doesn't need an ARC 
optimizer. Everything is a  move so no increase/decrease is done 
when doing that. Increase is done first when the programmer 
decides to 'clone' the reference. This inherently becomes 
optimized without any compiler support. However, this requires 
that the programmer inserts 'clone' when necessary so it isn't 
really automatic.

I was thinking about how to deal with this in D and the question 
is if it would be better to be able to control move as default 
per type basis. This way we can implement Rust style reference 
counting without intruding too much on the rest of the language. 
The question is if we want this or if we should go for a fully 
automated approach where the programmer doesn't need to worry 
about 'clone'.

Dec 06 2020

Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:

On Sunday, 6 December 2020 at 12:58:44 UTC, IGotD- wrote:
 I was thinking about how to deal with this in D and the 
 question is if it would be better to be able to control move as 
 default per type basis. This way we can implement Rust style 
 reference counting without intruding too much on the rest of 
 the language. The question is if we want this or if we should 
 go for a fully automated approach where the programmer doesn't 
 need to worry about 'clone'.

I dont know, but I suspect that people that use D want something 
more high level than Rust? But I dont use Rust, so...

Dec 06 2020

oddp <oddp posteo.de> writes:

On 06.12.20 06:16, Bruce Carneal via Digitalmars-d-learn wrote:
 How difficult would it be to add a, selectable, low-latency GC to dlang?
 
 Is it closer to "we cant get there from here" or "no big deal if you already
have the low-latency GC 
 in hand"?
 
 I've heard Walter mention performance issues (write barriers IIRC).  I'm also
interested in the 
 GC-flavor performance trade offs but here I'm just asking about feasibility.
 

What our closest competition, Nim, is up to with their mark-and-sweep
replacement ORC [1]:

ORC is the existing ARC algorithm (first shipped in version 1.2) plus a cycle
collector

[...]

ARC is Nim’s pure reference-counting GC, however, many reference count
operations are optimized 
away: Thanks to move semantics, the construction of a data structure does not
involve RC operations. 
And thanks to “cursor inference”, another innovation of Nim’s ARC
implementation, common data 
structure traversals do not involve RC operations either!

[...]

Benchmark:

Metric/algorithm         ORC    Mark&Sweep
Latency (Avg)      320.49 us      65.31 ms
Latency (Max)        6.24 ms     204.79 ms
Requests/sec        30963.96        282.69
Transfer/sec         1.48 MB      13.80 KB
Max memory           137 MiB       153 MiB

That’s right, ORC is over 100 times faster than the M&S GC. The reason is
that ORC only touches 
memory that the mutator touches, too.

[...]

- uses 2x less memory than classical GCs
- can be orders of magnitudes faster in throughput
- offers sub-millisecond latencies
- suited for (hard) realtime systems
- no “stop the world” phase
- oblivious to the size of the heap or the used stack space.


[1] https://nim-lang.org/blog/2020/12/08/introducing-orc.html

Dec 08 2020

oddp <oddp posteo.de> writes:

On 06.12.20 06:16, Bruce Carneal via Digitalmars-d-learn wrote:
 How difficult would it be to add a, selectable, low-latency GC to dlang?
 
 Is it closer to "we cant get there from here" or "no big deal if you already
have the low-latency GC 
 in hand"?
 
 I've heard Walter mention performance issues (write barriers IIRC).  I'm also
interested in the 
 GC-flavor performance trade offs but here I'm just asking about feasibility.
 

What our closest competition, Nim, is up to with their mark-and-sweep
replacement ORC [1]:

ORC is the existing ARC algorithm (first shipped in version 1.2) plus a cycle
collector

[...]

ARC is Nim’s pure reference-counting GC, however, many reference count
operations are optimized 
away: Thanks to move semantics, the construction of a data structure does not
involve RC operations. 
And thanks to “cursor inference”, another innovation of Nim’s ARC
implementation, common data 
structure traversals do not involve RC operations either!

[...]

Benchmark:

Metric/algorithm         ORC    Mark&Sweep
Latency (Avg)      320.49 us      65.31 ms
Latency (Max)        6.24 ms     204.79 ms
Requests/sec        30963.96        282.69
Transfer/sec         1.48 MB      13.80 KB
Max memory           137 MiB       153 MiB

That’s right, ORC is over 100 times faster than the M&S GC. The reason is
that ORC only touches 
memory that the mutator touches, too.

[...]

- uses 2x less memory than classical GCs
- can be orders of magnitudes faster in throughput
- offers sub-millisecond latencies
- suited for (hard) realtime systems
- no “stop the world” phase
- oblivious to the size of the heap or the used stack space.

There's also some discussion on /r/programming [2] and hackernews [3], but it
hasn't taken off yet.

[1] https://nim-lang.org/blog/2020/12/08/introducing-orc.html
[2] https://old.reddit.com/r/programming/comments/k95cc5/introducing_orc_nim_nextgen_memory_management/
[3] https://news.ycombinator.com/item?id=25345770

Dec 08 2020

D Programming

C/C++ Programming

Other

digitalmars.D.learn - low-latency GC