www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - low-latency GC

reply Bruce Carneal <bcarneal gmail.com> writes:
How difficult would it be to add a, selectable, low-latency GC to 
dlang?

Is it closer to "we cant get there from here" or "no big deal if 
you already have the low-latency GC in hand"?

I've heard Walter mention performance issues (write barriers 
IIRC).  I'm also interested in the GC-flavor performance trade 
offs but here I'm just asking about feasibility.
Dec 05 2020
next sibling parent reply Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Sunday, 6 December 2020 at 05:16:26 UTC, Bruce Carneal wrote:
 How difficult would it be to add a, selectable, low-latency GC 
 to dlang?

 Is it closer to "we cant get there from here" or "no big deal 
 if you already have the low-latency GC in hand"?

 I've heard Walter mention performance issues (write barriers 
 IIRC).  I'm also interested in the GC-flavor performance trade 
 offs but here I'm just asking about feasibility.
The only reasonable option for D is single threaded GC or ARC.
Dec 05 2020
next sibling parent reply Bruce Carneal <bcarneal gmail.com> writes:
On Sunday, 6 December 2020 at 05:29:37 UTC, Ola Fosheim Grostad 
wrote:
 On Sunday, 6 December 2020 at 05:16:26 UTC, Bruce Carneal wrote:
 How difficult would it be to add a, selectable, low-latency GC 
 to dlang?

 Is it closer to "we cant get there from here" or "no big deal 
 if you already have the low-latency GC in hand"?

 I've heard Walter mention performance issues (write barriers 
 IIRC).  I'm also interested in the GC-flavor performance trade 
 offs but here I'm just asking about feasibility.
The only reasonable option for D is single threaded GC or ARC.
OK. Some rationale? Do you, for example, believe that no-probable-dlanger could benefit from a low-latency GC? That it is too hard to implement? That the language is somehow incompatible? That ...
Dec 05 2020
parent reply Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Sunday, 6 December 2020 at 05:41:05 UTC, Bruce Carneal wrote:
 OK.  Some rationale?  Do you, for example, believe that 
 no-probable-dlanger could benefit from a low-latency GC?  That 
 it is too hard to implement?  That the language is somehow 
 incompatible? That ...
The GC needs to scan all the affected call stacks before it can do incremental collection. Multi threaded GC is generally not compatible with low level programming.
Dec 05 2020
parent reply Bruce Carneal <bcarneal gmail.com> writes:
On Sunday, 6 December 2020 at 06:52:41 UTC, Ola Fosheim Grostad 
wrote:
 On Sunday, 6 December 2020 at 05:41:05 UTC, Bruce Carneal wrote:
 OK.  Some rationale?  Do you, for example, believe that 
 no-probable-dlanger could benefit from a low-latency GC?  That 
 it is too hard to implement?  That the language is somehow 
 incompatible? That ...
The GC needs to scan all the affected call stacks before it can do incremental collection. Multi threaded GC is generally not compatible with low level programming.
GCs scan memory, sure. Lots of variations. Not germane. Not a rationale. D is employed at multiple "levels". Whatever level you call it, Go and modern JVMs employ low latency GCs in multi-threaded environments. Some people would like to use D at that "level". My question remains: how difficult would it be to bring such technology to D as a GC option? Is it precluded somehow by the language? Is it doable but quite a lot of effort because ...? Is it no big deal once you have the GC itself because you only need xyz hooks? Is it ...? Also, I think Walter may have been concerned about read barrier overhead but, again, I'm looking for feasibility information. What would it take to get something that we could compare?
Dec 05 2020
parent reply Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Sunday, 6 December 2020 at 07:45:17 UTC, Bruce Carneal wrote:
 GCs scan memory, sure.  Lots of variations.  Not germane.  Not 
 a rationale.
We need to freeze the threads when collecting stacks/globals.
 D is employed at multiple "levels".  Whatever level you call 
 it, Go and modern JVMs employ low latency GCs in multi-threaded 
 environments.  Some people would like to use D at that "level".
Yes, but they don't allow low level programming. Go also freeze to sync threads this has a rather profound impact on code generation. They have spent a lot of effort on sync instructions in code gen to lower the latency AFAIK.
 My question remains: how difficult would it be to bring such 
 technology to D as a GC option?  Is it precluded somehow by the 
 language?   Is it doable but quite a lot of effort because ...?
  Is it no big deal once you have the GC itself because you only 
 need xyz hooks? Is it ...?
Get rid of the system stack and globals. Use only closures and put in a restrictive memory model. Then maybe you can get a fully no freeze multi threaded GC. That would be a different language.
 Also, I think Walter may have been concerned about read barrier 
 overhead but, again, I'm looking for feasibility information.  
 What would it take to get something that we could compare?
Just add ARC + single threaded GC. And even that is quite expensive.
Dec 06 2020
next sibling parent reply Bruce Carneal <bcarneal gmail.com> writes:
On Sunday, 6 December 2020 at 08:12:58 UTC, Ola Fosheim Grostad 
wrote:
 On Sunday, 6 December 2020 at 07:45:17 UTC, Bruce Carneal wrote:
 GCs scan memory, sure.  Lots of variations.  Not germane.  Not 
 a rationale.
We need to freeze the threads when collecting stacks/globals.
OK. Low latency GCs exist.
 D is employed at multiple "levels".  Whatever level you call 
 it, Go and modern JVMs employ low latency GCs in 
 multi-threaded environments.  Some people would like to use D 
 at that "level".
Yes, but they don't allow low level programming. Go also freeze to sync threads this has a rather profound impact on code generation. They have spent a lot of effort on sync instructions in code gen to lower the latency AFAIK.
So, much of the difficulty in bringing low-latency GC to dlang would be the large code gen changes required. If it is a really big effort then that is all we need to know. Not worth it until we can see a big payoff and have more resources.
 My question remains: how difficult would it be to bring such 
 technology to D as a GC option?  Is it precluded somehow by 
 the language?   Is it doable but quite a lot of effort because 
 ...?
  Is it no big deal once you have the GC itself because you 
 only need xyz hooks? Is it ...?
Get rid of the system stack and globals. Use only closures and put in a restrictive memory model. Then maybe you can get a fully no freeze multi threaded GC. That would be a different language.
It would be, but I don't think it is the only way to get lower latency GC. That said, if the code gen effort you mentioned earlier is a big deal, then no need to speculate/examine further.
 Also, I think Walter may have been concerned about read 
 barrier overhead but, again, I'm looking for feasibility 
 information.  What would it take to get something that we 
 could compare?
Just add ARC + single threaded GC. And even that is quite expensive.
Thanks for the feedback.
Dec 06 2020
parent reply Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Sunday, 6 December 2020 at 08:36:49 UTC, Bruce Carneal wrote:
 Yes, but they don't allow low level programming. Go also 
 freeze to sync threads this has a rather profound impact on 
 code generation. They have spent a lot of effort on  sync 
 instructions in code gen to lower the latency AFAIK.
So, much of the difficulty in bringing low-latency GC to dlang would be the large code gen changes required. If it is a really big effort then that is all we need to know. Not worth it until we can see a big payoff and have more resources.
Well, you could in theory avoid putting owning pointers on the stack/globals or require that they are registered as gc roots. Then you don't have to scan the stack. All you need then is write barriers. IIRC
Dec 06 2020
next sibling parent Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Sunday, 6 December 2020 at 08:59:49 UTC, Ola Fosheim Grostad 
wrote:
 Well, you could in theory avoid putting owning pointers on the 
 stack/globals or require that they are registered as gc roots. 
 Then you don't have to scan the stack. All you need then is 
 write barriers. IIRC
Abd read barriers... I assume. However with single threaded incremental, write barriers should be enough.
Dec 06 2020
prev sibling parent reply Bruce Carneal <bcarneal gmail.com> writes:
On Sunday, 6 December 2020 at 08:59:49 UTC, Ola Fosheim Grostad 
wrote:
 On Sunday, 6 December 2020 at 08:36:49 UTC, Bruce Carneal wrote:
 Yes, but they don't allow low level programming. Go also 
 freeze to sync threads this has a rather profound impact on 
 code generation. They have spent a lot of effort on  sync 
 instructions in code gen to lower the latency AFAIK.
So, much of the difficulty in bringing low-latency GC to dlang would be the large code gen changes required. If it is a really big effort then that is all we need to know. Not worth it until we can see a big payoff and have more resources.
Well, you could in theory avoid putting owning pointers on the stack/globals or require that they are registered as gc roots. Then you don't have to scan the stack. All you need then is write barriers. IIRC
'shared' with teeth?
Dec 06 2020
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Sunday, 6 December 2020 at 14:45:21 UTC, Bruce Carneal wrote:
 Well, you could in theory avoid putting owning pointers on the 
 stack/globals or require that they are registered as gc roots. 
 Then you don't have to scan the stack. All you need then is 
 write barriers. IIRC
'shared' with teeth?
It was more a hypothetical, as read barriers are too expensive. But write barriers should be ok, so a single-threaded incremental collector could work well if D takes a principled stance on objects not being 'shared' not being handed over to other threads without pinning them in the GC. Maybe a better option for D than ARC, as it is closer to what people are used to.
Dec 06 2020
parent reply IGotD- <nise nise.com> writes:
On Sunday, 6 December 2020 at 15:44:32 UTC, Ola Fosheim Grøstad 
wrote:
 It was more a hypothetical, as read barriers are too expensive. 
 But write barriers should be ok, so a single-threaded 
 incremental collector could work well if D takes a principled 
 stance on objects not being 'shared' not being handed over to 
 other threads without pinning them in the GC.

 Maybe a better option for D than ARC, as it is closer to what 
 people are used to.
In kernel programming there are plenty of atomic reference counted objects. The reason is that is you have kernel that supports SMP you must have it because you don't really know which CPU is working with a structure at any given time. These are often manually reference counted objects, which can lead to memory leaking bugs but they are not that hard to find. Is automatic atomic reference counting a contender for kernels? In kernels you want to reduce the increase/decrease of the counts. Therefore the Rust approach using 'clone' is better unless there is some optimizer that can figure it out. Performance is important in kernels, you don't want the kernel to steal useful CPU time that otherwise should go to programs. In general I think that reference counting should be supported in D, not only implicitly but also under the hood with fat pointers. This will make D more attractive to performance applications. Another advantage is the reference counting can use malloc/free directly if needed without any complicated GC layer with associated meta data. Also tracing GC in a kernel is my opinion not desirable. For the reason I previously mentioned, you want to reduce meta data, you want reduce CPU time, you want to reduce fragmentation. Special allocators for structures are often used.
Dec 06 2020
parent Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Sunday, 6 December 2020 at 17:35:19 UTC, IGotD- wrote:
 Is automatic atomic reference counting a contender for kernels? 
 In kernels you want to reduce the increase/decrease of the 
 counts. Therefore the Rust approach using 'clone' is better 
 unless there is some optimizer that can figure it out. 
 Performance is important in kernels, you don't want the kernel 
 to steal useful CPU time that otherwise should go to programs.
I am not sure if kernel authors want autmatic memory management, they tend to want full control and transparency. Maybe something people who write device drivers would consider.
 In general I think that reference counting should be supported 
 in D, not only implicitly but also under the hood with fat 
 pointers. This will make D more attractive to performance 
 applications. Another advantage is the reference counting can 
 use malloc/free directly if needed without any complicated GC 
 layer with associated meta data.
Yes, I would like to see it, just expect that there will be protests when people realize that they have to make ownership explicit.
 Also tracing GC in a kernel is my opinion not desirable. For 
 the reason I previously mentioned, you want to reduce meta 
 data, you want reduce CPU time, you want to reduce 
 fragmentation. Special allocators for structures are often used.
Yes, an ARC solution should support fixed size allocators for types that are frequently allocated to get better speed.
Dec 06 2020
prev sibling parent reply Paulo Pinto <pjmlp progtools.org> writes:
On Sunday, 6 December 2020 at 08:12:58 UTC, Ola Fosheim Grostad 
wrote:
 On Sunday, 6 December 2020 at 07:45:17 UTC, Bruce Carneal wrote:
 GCs scan memory, sure.  Lots of variations.  Not germane.  Not 
 a rationale.
We need to freeze the threads when collecting stacks/globals.
 D is employed at multiple "levels".  Whatever level you call 
 it, Go and modern JVMs employ low latency GCs in 
 multi-threaded environments.  Some people would like to use D 
 at that "level".
Yes, but they don't allow low level programming. Go also freeze to sync threads this has a rather profound impact on code generation. They have spent a lot of effort on sync instructions in code gen to lower the latency AFAIK.
They surely do. Looking forward to see D achieve the same performance level as .NET 5 is capable of, beating Google's own gRPC C++ implementation, only Rust implementation beats it. https://www.infoq.com/news/2020/12/aspnet-core-improvement-dotnet-5/ And while on the subject of low level programming in JVM or .NET. https://www.infoq.com/news/2020/12/net-5-runtime-improvements/
 Many of the performance improvements in the HTTP/2 
 implementation are related to the reimplementation from 

 "still is this kind of idea that managed languages are not 
 quite up to the task for some of those low-level super 
 performance sensitive components,
Rich Lander being one of the main .NET architects, and upcoming Java 16 features, http://openjdk.java.net/jeps/389 (JNI replacement), http://openjdk.java.net/jeps/393 (native memory management). As I already mentioned in another thread, rebooting the language to pull in imaginary crowds will only do more damage than good, while the ones deemed unusable by the same imaginary crowd just keep winning market share, slowly and steady, even if takes yet another couple of years.
Dec 06 2020
parent reply Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Sunday, 6 December 2020 at 14:44:25 UTC, Paulo Pinto wrote:
 And while on the subject of low level programming in JVM or 
 .NET.

 https://www.infoq.com/news/2020/12/net-5-runtime-improvements/
Didnt say anything about low level, only simd intrinsics, which isnt really low level? It also stated "When it came to something that is pure CPU raw computation doing nothing but number crunching, in general, you can still eke out better performance if you really focus on "pedal to the metal" with your C/C++ code." So it is more of a Go contender, and Go is not a systems level language... Apples and oranges.
 As I already mentioned in another thread, rebooting the 
 language to pull in imaginary crowds will only do more damage 
 than good, while the ones deemed unusable by the same imaginary 
 crowd just keep winning market share, slowly and steady, even 
 if takes yet another couple of years.
A fair number of people here are in that imaginary crowd. So, I guess it isnt imaginary...
Dec 06 2020
parent reply Bruce Carneal <bcarneal gmail.com> writes:
On Sunday, 6 December 2020 at 16:42:00 UTC, Ola Fosheim Grostad 
wrote:
 On Sunday, 6 December 2020 at 14:44:25 UTC, Paulo Pinto wrote:
 And while on the subject of low level programming in JVM or 
 .NET.

 https://www.infoq.com/news/2020/12/net-5-runtime-improvements/
Didnt say anything about low level, only simd intrinsics, which isnt really low level? It also stated "When it came to something that is pure CPU raw computation doing nothing but number crunching, in general, you can still eke out better performance if you really focus on "pedal to the metal" with your C/C++ code."
So you must make the familiar "ease-of-programming" vs "x% of performance" choice, where 'x' is presumably much smaller than earlier.
 So it is more of a Go contender, and Go is not a systems level 
 language... Apples and oranges.
D is good for systems level work but that's not all. I use it for projects where, in the past, I'd have split the work between two languages (Python and C/C++). I much prefer working with a single language that spans the problem space. If there is a way to extend D's reach with zero or a near-zero complexity increase as seen by the programmer, I believe we should (as/when resources allow of course).
Dec 06 2020
parent Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Sunday, 6 December 2020 at 17:28:52 UTC, Bruce Carneal wrote:
 D is good for systems level work but that's not all.  I use it 
 for projects where, in the past, I'd have split the work 
 between two languages (Python and C/C++).  I much prefer 
 working with a single language that spans the problem space.
My impression from reading the forums is that people either use D as a replacement for C/C++ or Python/numpy, so I think your experience covers the essential use case scenario that is dominating current D usage? Any improvements have to improve both dimension, I agree.
 If there is a way to extend D's reach with zero or a near-zero 
 complexity increase as seen by the programmer, I believe we 
 should (as/when resources allow of course).
ARC involves a complexity increase, to some extent. Library authors have to think a bit more principled about when objects should be phased out and destructed, which I think tend to lead to better programs. It would also allow for faster precise collection. So it could be beneficial for all.
Dec 06 2020
prev sibling parent reply Max Haughton <maxhaton gmail.com> writes:
On Sunday, 6 December 2020 at 05:29:37 UTC, Ola Fosheim Grostad 
wrote:
 On Sunday, 6 December 2020 at 05:16:26 UTC, Bruce Carneal wrote:
 How difficult would it be to add a, selectable, low-latency GC 
 to dlang?

 Is it closer to "we cant get there from here" or "no big deal 
 if you already have the low-latency GC in hand"?

 I've heard Walter mention performance issues (write barriers 
 IIRC).  I'm also interested in the GC-flavor performance trade 
 offs but here I'm just asking about feasibility.
The only reasonable option for D is single threaded GC or ARC.
It has to be either some kind of heavily customisable small GC (i.e. with our resources the GC cannot please everyone), or arc. The GC as it is just hurts the language. Realistically, we probably need some kind of working group or at least serious discussion to really narrow down where to go in the future. The GC as it is now must go, we need borrowing to work with more than just pointers, etc. The issue is that it can't just be done incrementally, it needs to be specified beforehand.
Dec 06 2020
parent reply Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Sunday, 6 December 2020 at 10:44:39 UTC, Max Haughton wrote:
 On Sunday, 6 December 2020 at 05:29:37 UTC, Ola Fosheim Grostad 
 wrote:
 It has to be either some kind of heavily customisable small GC 
 (i.e. with our resources the GC cannot please everyone), or 
 arc. The GC as it is just hurts the language.

 Realistically, we probably need some kind of working group or 
 at least serious discussion to really narrow down where to go 
 in the future. The GC as it is now must go, we need borrowing 
 to work with more than just pointers, etc.

 The issue is that it can't just be done incrementally, it needs 
 to be specified beforehand.
ARC can be done incrementally, we can do it as a library first and use a modified version existing GC for detecting failed borrows at runtime during testing. But all libraries that use owning pointers need ownership to be made explicit. A static borrow checker an ARC optimizer needs a high level IR though. A lot of work though.
Dec 06 2020
next sibling parent reply Max Haughton <maxhaton gmail.com> writes:
On Sunday, 6 December 2020 at 11:07:50 UTC, Ola Fosheim Grostad 
wrote:
 On Sunday, 6 December 2020 at 10:44:39 UTC, Max Haughton wrote:
 On Sunday, 6 December 2020 at 05:29:37 UTC, Ola Fosheim 
 Grostad wrote:
 It has to be either some kind of heavily customisable small GC 
 (i.e. with our resources the GC cannot please everyone), or 
 arc. The GC as it is just hurts the language.

 Realistically, we probably need some kind of working group or 
 at least serious discussion to really narrow down where to go 
 in the future. The GC as it is now must go, we need borrowing 
 to work with more than just pointers, etc.

 The issue is that it can't just be done incrementally, it 
 needs to be specified beforehand.
ARC can be done incrementally, we can do it as a library first and use a modified version existing GC for detecting failed borrows at runtime during testing. But all libraries that use owning pointers need ownership to be made explicit. A static borrow checker an ARC optimizer needs a high level IR though. A lot of work though.
ARC with a library will have overhead unless the compiler/ABI is changed e.g. unique_ptr in C++ has an indirection. The AST effectively is a high-level IR. Not a good one, but good enough. The system Walter has built shows the means are there in the compiler already. As things are at the moment, the annotations we have for pointers like scope go a long way, but the language doesn't deal with things like borrowing structs (and the contents of structs i.e. making a safe vector) properly yet. That is what needs thinking about.
Dec 06 2020
parent reply Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Sunday, 6 December 2020 at 11:27:39 UTC, Max Haughton wrote:
 ARC with a library will have overhead unless the compiler/ABI 
 is changed e.g. unique_ptr in C++ has an indirection.
No, unique doesnt need indirection, neither does ARC, we put the ref count at a negative offset. shared_ptr is a fat pointer with the ref count as a separate object to support existing C libraries, and make weak_ptr easy to implement. But no need for indirection.
 The AST effectively is a high-level IR. Not a good one, but 
 good enough. The system Walter has built shows the means are 
 there in the compiler already.
I think you need a new IR, but it does not have to be used for code gen, it can point back to the ast nodes that represent ARC pointer assignments. One could probably translate the one used in Rust, even.
Dec 06 2020
parent reply Max Haughton <maxhaton gmail.com> writes:
On Sunday, 6 December 2020 at 11:35:17 UTC, Ola Fosheim Grostad 
wrote:
 On Sunday, 6 December 2020 at 11:27:39 UTC, Max Haughton wrote:
 [...]
No, unique doesnt need indirection, neither does ARC, we put the ref count at a negative offset. shared_ptr is a fat pointer with the ref count as a separate object to support existing C libraries, and make weak_ptr easy to implement. But no need for indirection.
 [...]
I think you need a new IR, but it does not have to be used for code gen, it can point back to the ast nodes that represent ARC pointer assignments. One could probably translate the one used in Rust, even.
https://gcc.godbolt.org/z/bnbMeY
Dec 06 2020
parent Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Sunday, 6 December 2020 at 14:11:41 UTC, Max Haughton wrote:
 On Sunday, 6 December 2020 at 11:35:17 UTC, Ola Fosheim Grostad 
 wrote:
 On Sunday, 6 December 2020 at 11:27:39 UTC, Max Haughton wrote:
 [...]
No, unique doesnt need indirection, neither does ARC, we put the ref count at a negative offset. shared_ptr is a fat pointer with the ref count as a separate object to support existing C libraries, and make weak_ptr easy to implement. But no need for indirection.
 [...]
I think you need a new IR, but it does not have to be used for code gen, it can point back to the ast nodes that represent ARC pointer assignments. One could probably translate the one used in Rust, even.
https://gcc.godbolt.org/z/bnbMeY
If you pass something as a parameter then there may or may not be an extra reference involved. Not specific for smart pointers, but ARC optimization should take care of that.
Dec 06 2020
prev sibling parent reply IGotD- <nise nise.com> writes:
On Sunday, 6 December 2020 at 11:07:50 UTC, Ola Fosheim Grostad 
wrote:
 ARC can be done incrementally, we can do it as a library first 
 and use a modified version existing GC for detecting failed 
 borrows at runtime during testing.

 But all libraries that use owning pointers need ownership to be 
 made explicit.

 A static borrow checker an ARC optimizer needs a high level IR 
 though. A lot of work though.
The Rust approach is interesting as it doesn't need an ARC optimizer. Everything is a move so no increase/decrease is done when doing that. Increase is done first when the programmer decides to 'clone' the reference. This inherently becomes optimized without any compiler support. However, this requires that the programmer inserts 'clone' when necessary so it isn't really automatic. I was thinking about how to deal with this in D and the question is if it would be better to be able to control move as default per type basis. This way we can implement Rust style reference counting without intruding too much on the rest of the language. The question is if we want this or if we should go for a fully automated approach where the programmer doesn't need to worry about 'clone'.
Dec 06 2020
parent Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Sunday, 6 December 2020 at 12:58:44 UTC, IGotD- wrote:
 I was thinking about how to deal with this in D and the 
 question is if it would be better to be able to control move as 
 default per type basis. This way we can implement Rust style 
 reference counting without intruding too much on the rest of 
 the language. The question is if we want this or if we should 
 go for a fully automated approach where the programmer doesn't 
 need to worry about 'clone'.
I dont know, but I suspect that people that use D want something more high level than Rust? But I dont use Rust, so...
Dec 06 2020
prev sibling next sibling parent oddp <oddp posteo.de> writes:
On 06.12.20 06:16, Bruce Carneal via Digitalmars-d-learn wrote:
 How difficult would it be to add a, selectable, low-latency GC to dlang?
 
 Is it closer to "we cant get there from here" or "no big deal if you already
have the low-latency GC 
 in hand"?
 
 I've heard Walter mention performance issues (write barriers IIRC).  I'm also
interested in the 
 GC-flavor performance trade offs but here I'm just asking about feasibility.
 
What our closest competition, Nim, is up to with their mark-and-sweep replacement ORC [1]: ORC is the existing ARC algorithm (first shipped in version 1.2) plus a cycle collector [...] ARC is Nim’s pure reference-counting GC, however, many reference count operations are optimized away: Thanks to move semantics, the construction of a data structure does not involve RC operations. And thanks to “cursor inference”, another innovation of Nim’s ARC implementation, common data structure traversals do not involve RC operations either! [...] Benchmark: Metric/algorithm ORC Mark&Sweep Latency (Avg) 320.49 us 65.31 ms Latency (Max) 6.24 ms 204.79 ms Requests/sec 30963.96 282.69 Transfer/sec 1.48 MB 13.80 KB Max memory 137 MiB 153 MiB That’s right, ORC is over 100 times faster than the M&S GC. The reason is that ORC only touches memory that the mutator touches, too. [...] - uses 2x less memory than classical GCs - can be orders of magnitudes faster in throughput - offers sub-millisecond latencies - suited for (hard) realtime systems - no “stop the world” phase - oblivious to the size of the heap or the used stack space. [1] https://nim-lang.org/blog/2020/12/08/introducing-orc.html
Dec 08 2020
prev sibling parent oddp <oddp posteo.de> writes:
On 06.12.20 06:16, Bruce Carneal via Digitalmars-d-learn wrote:
 How difficult would it be to add a, selectable, low-latency GC to dlang?
 
 Is it closer to "we cant get there from here" or "no big deal if you already
have the low-latency GC 
 in hand"?
 
 I've heard Walter mention performance issues (write barriers IIRC).  I'm also
interested in the 
 GC-flavor performance trade offs but here I'm just asking about feasibility.
 
What our closest competition, Nim, is up to with their mark-and-sweep replacement ORC [1]: ORC is the existing ARC algorithm (first shipped in version 1.2) plus a cycle collector [...] ARC is Nim’s pure reference-counting GC, however, many reference count operations are optimized away: Thanks to move semantics, the construction of a data structure does not involve RC operations. And thanks to “cursor inference”, another innovation of Nim’s ARC implementation, common data structure traversals do not involve RC operations either! [...] Benchmark: Metric/algorithm ORC Mark&Sweep Latency (Avg) 320.49 us 65.31 ms Latency (Max) 6.24 ms 204.79 ms Requests/sec 30963.96 282.69 Transfer/sec 1.48 MB 13.80 KB Max memory 137 MiB 153 MiB That’s right, ORC is over 100 times faster than the M&S GC. The reason is that ORC only touches memory that the mutator touches, too. [...] - uses 2x less memory than classical GCs - can be orders of magnitudes faster in throughput - offers sub-millisecond latencies - suited for (hard) realtime systems - no “stop the world” phase - oblivious to the size of the heap or the used stack space. There's also some discussion on /r/programming [2] and hackernews [3], but it hasn't taken off yet. [1] https://nim-lang.org/blog/2020/12/08/introducing-orc.html [2] https://old.reddit.com/r/programming/comments/k95cc5/introducing_orc_nim_nextgen_memory_management/ [3] https://news.ycombinator.com/item?id=25345770
Dec 08 2020