digitalmars.D.learn - low-latency GC
- Bruce Carneal (7/7) Dec 05 2020 How difficult would it be to add a, selectable, low-latency GC to
- Ola Fosheim Grostad (2/9) Dec 05 2020 The only reasonable option for D is single threaded GC or ARC.
- Bruce Carneal (6/17) Dec 05 2020 OK. Some rationale? Do you, for example, believe that
- Ola Fosheim Grostad (4/8) Dec 05 2020 The GC needs to scan all the affected call stacks before it can
- Bruce Carneal (15/23) Dec 05 2020 GCs scan memory, sure. Lots of variations. Not germane. Not a
- Ola Fosheim Grostad (11/24) Dec 06 2020 Yes, but they don't allow low level programming. Go also freeze
- Bruce Carneal (11/39) Dec 06 2020 OK. Low latency GCs exist.
- Ola Fosheim Grostad (5/13) Dec 06 2020 Well, you could in theory avoid putting owning pointers on the
- Ola Fosheim Grostad (4/8) Dec 06 2020 Abd read barriers... I assume. However with single threaded
- Bruce Carneal (3/17) Dec 06 2020 'shared' with teeth?
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (8/13) Dec 06 2020 It was more a hypothetical, as read barriers are too expensive.
- IGotD- (24/31) Dec 06 2020 In kernel programming there are plenty of atomic reference
- Ola Fosheim Grostad (9/25) Dec 06 2020 I am not sure if kernel authors want autmatic memory management,
- Paulo Pinto (18/36) Dec 06 2020 They surely do.
- Ola Fosheim Grostad (11/19) Dec 06 2020 Didnt say anything about low level, only simd intrinsics, which
- Bruce Carneal (12/25) Dec 06 2020 So you must make the familiar "ease-of-programming" vs "x% of
- Ola Fosheim Grostad (11/18) Dec 06 2020 My impression from reading the forums is that people either use D
- Max Haughton (11/22) Dec 06 2020 It has to be either some kind of heavily customisable small GC
- Ola Fosheim Grostad (8/19) Dec 06 2020 ARC can be done incrementally, we can do it as a library first
- Max Haughton (12/33) Dec 06 2020 ARC with a library will have overhead unless the compiler/ABI is
- Ola Fosheim Grostad (10/15) Dec 06 2020 No, unique doesnt need indirection, neither does ARC, we put the
- Max Haughton (3/15) Dec 06 2020 https://gcc.godbolt.org/z/bnbMeY
- Ola Fosheim Grostad (4/24) Dec 06 2020 If you pass something as a parameter then there may or may not be
- IGotD- (16/23) Dec 06 2020 The Rust approach is interesting as it doesn't need an ARC
- Ola Fosheim Grostad (3/10) Dec 06 2020 I dont know, but I suspect that people that use D want something
- oddp (26/34) Dec 08 2020 What our closest competition, Nim, is up to with their mark-and-sweep re...
- oddp (29/37) Dec 08 2020 What our closest competition, Nim, is up to with their mark-and-sweep re...
How difficult would it be to add a, selectable, low-latency GC to dlang? Is it closer to "we cant get there from here" or "no big deal if you already have the low-latency GC in hand"? I've heard Walter mention performance issues (write barriers IIRC). I'm also interested in the GC-flavor performance trade offs but here I'm just asking about feasibility.
Dec 05 2020
On Sunday, 6 December 2020 at 05:16:26 UTC, Bruce Carneal wrote:How difficult would it be to add a, selectable, low-latency GC to dlang? Is it closer to "we cant get there from here" or "no big deal if you already have the low-latency GC in hand"? I've heard Walter mention performance issues (write barriers IIRC). I'm also interested in the GC-flavor performance trade offs but here I'm just asking about feasibility.The only reasonable option for D is single threaded GC or ARC.
Dec 05 2020
On Sunday, 6 December 2020 at 05:29:37 UTC, Ola Fosheim Grostad wrote:On Sunday, 6 December 2020 at 05:16:26 UTC, Bruce Carneal wrote:OK. Some rationale? Do you, for example, believe that no-probable-dlanger could benefit from a low-latency GC? That it is too hard to implement? That the language is somehow incompatible? That ...How difficult would it be to add a, selectable, low-latency GC to dlang? Is it closer to "we cant get there from here" or "no big deal if you already have the low-latency GC in hand"? I've heard Walter mention performance issues (write barriers IIRC). I'm also interested in the GC-flavor performance trade offs but here I'm just asking about feasibility.The only reasonable option for D is single threaded GC or ARC.
Dec 05 2020
On Sunday, 6 December 2020 at 05:41:05 UTC, Bruce Carneal wrote:OK. Some rationale? Do you, for example, believe that no-probable-dlanger could benefit from a low-latency GC? That it is too hard to implement? That the language is somehow incompatible? That ...The GC needs to scan all the affected call stacks before it can do incremental collection. Multi threaded GC is generally not compatible with low level programming.
Dec 05 2020
On Sunday, 6 December 2020 at 06:52:41 UTC, Ola Fosheim Grostad wrote:On Sunday, 6 December 2020 at 05:41:05 UTC, Bruce Carneal wrote:GCs scan memory, sure. Lots of variations. Not germane. Not a rationale. D is employed at multiple "levels". Whatever level you call it, Go and modern JVMs employ low latency GCs in multi-threaded environments. Some people would like to use D at that "level". My question remains: how difficult would it be to bring such technology to D as a GC option? Is it precluded somehow by the language? Is it doable but quite a lot of effort because ...? Is it no big deal once you have the GC itself because you only need xyz hooks? Is it ...? Also, I think Walter may have been concerned about read barrier overhead but, again, I'm looking for feasibility information. What would it take to get something that we could compare?OK. Some rationale? Do you, for example, believe that no-probable-dlanger could benefit from a low-latency GC? That it is too hard to implement? That the language is somehow incompatible? That ...The GC needs to scan all the affected call stacks before it can do incremental collection. Multi threaded GC is generally not compatible with low level programming.
Dec 05 2020
On Sunday, 6 December 2020 at 07:45:17 UTC, Bruce Carneal wrote:GCs scan memory, sure. Lots of variations. Not germane. Not a rationale.We need to freeze the threads when collecting stacks/globals.D is employed at multiple "levels". Whatever level you call it, Go and modern JVMs employ low latency GCs in multi-threaded environments. Some people would like to use D at that "level".Yes, but they don't allow low level programming. Go also freeze to sync threads this has a rather profound impact on code generation. They have spent a lot of effort on sync instructions in code gen to lower the latency AFAIK.My question remains: how difficult would it be to bring such technology to D as a GC option? Is it precluded somehow by the language? Is it doable but quite a lot of effort because ...? Is it no big deal once you have the GC itself because you only need xyz hooks? Is it ...?Get rid of the system stack and globals. Use only closures and put in a restrictive memory model. Then maybe you can get a fully no freeze multi threaded GC. That would be a different language.Also, I think Walter may have been concerned about read barrier overhead but, again, I'm looking for feasibility information. What would it take to get something that we could compare?Just add ARC + single threaded GC. And even that is quite expensive.
Dec 06 2020
On Sunday, 6 December 2020 at 08:12:58 UTC, Ola Fosheim Grostad wrote:On Sunday, 6 December 2020 at 07:45:17 UTC, Bruce Carneal wrote:OK. Low latency GCs exist.GCs scan memory, sure. Lots of variations. Not germane. Not a rationale.We need to freeze the threads when collecting stacks/globals.So, much of the difficulty in bringing low-latency GC to dlang would be the large code gen changes required. If it is a really big effort then that is all we need to know. Not worth it until we can see a big payoff and have more resources.D is employed at multiple "levels". Whatever level you call it, Go and modern JVMs employ low latency GCs in multi-threaded environments. Some people would like to use D at that "level".Yes, but they don't allow low level programming. Go also freeze to sync threads this has a rather profound impact on code generation. They have spent a lot of effort on sync instructions in code gen to lower the latency AFAIK.It would be, but I don't think it is the only way to get lower latency GC. That said, if the code gen effort you mentioned earlier is a big deal, then no need to speculate/examine further.My question remains: how difficult would it be to bring such technology to D as a GC option? Is it precluded somehow by the language? Is it doable but quite a lot of effort because ...? Is it no big deal once you have the GC itself because you only need xyz hooks? Is it ...?Get rid of the system stack and globals. Use only closures and put in a restrictive memory model. Then maybe you can get a fully no freeze multi threaded GC. That would be a different language.Thanks for the feedback.Also, I think Walter may have been concerned about read barrier overhead but, again, I'm looking for feasibility information. What would it take to get something that we could compare?Just add ARC + single threaded GC. And even that is quite expensive.
Dec 06 2020
On Sunday, 6 December 2020 at 08:36:49 UTC, Bruce Carneal wrote:Well, you could in theory avoid putting owning pointers on the stack/globals or require that they are registered as gc roots. Then you don't have to scan the stack. All you need then is write barriers. IIRCYes, but they don't allow low level programming. Go also freeze to sync threads this has a rather profound impact on code generation. They have spent a lot of effort on sync instructions in code gen to lower the latency AFAIK.So, much of the difficulty in bringing low-latency GC to dlang would be the large code gen changes required. If it is a really big effort then that is all we need to know. Not worth it until we can see a big payoff and have more resources.
Dec 06 2020
On Sunday, 6 December 2020 at 08:59:49 UTC, Ola Fosheim Grostad wrote:Well, you could in theory avoid putting owning pointers on the stack/globals or require that they are registered as gc roots. Then you don't have to scan the stack. All you need then is write barriers. IIRCAbd read barriers... I assume. However with single threaded incremental, write barriers should be enough.
Dec 06 2020
On Sunday, 6 December 2020 at 08:59:49 UTC, Ola Fosheim Grostad wrote:On Sunday, 6 December 2020 at 08:36:49 UTC, Bruce Carneal wrote:'shared' with teeth?Well, you could in theory avoid putting owning pointers on the stack/globals or require that they are registered as gc roots. Then you don't have to scan the stack. All you need then is write barriers. IIRCYes, but they don't allow low level programming. Go also freeze to sync threads this has a rather profound impact on code generation. They have spent a lot of effort on sync instructions in code gen to lower the latency AFAIK.So, much of the difficulty in bringing low-latency GC to dlang would be the large code gen changes required. If it is a really big effort then that is all we need to know. Not worth it until we can see a big payoff and have more resources.
Dec 06 2020
On Sunday, 6 December 2020 at 14:45:21 UTC, Bruce Carneal wrote:It was more a hypothetical, as read barriers are too expensive. But write barriers should be ok, so a single-threaded incremental collector could work well if D takes a principled stance on objects not being 'shared' not being handed over to other threads without pinning them in the GC. Maybe a better option for D than ARC, as it is closer to what people are used to.Well, you could in theory avoid putting owning pointers on the stack/globals or require that they are registered as gc roots. Then you don't have to scan the stack. All you need then is write barriers. IIRC'shared' with teeth?
Dec 06 2020
On Sunday, 6 December 2020 at 15:44:32 UTC, Ola Fosheim Grøstad wrote:It was more a hypothetical, as read barriers are too expensive. But write barriers should be ok, so a single-threaded incremental collector could work well if D takes a principled stance on objects not being 'shared' not being handed over to other threads without pinning them in the GC. Maybe a better option for D than ARC, as it is closer to what people are used to.In kernel programming there are plenty of atomic reference counted objects. The reason is that is you have kernel that supports SMP you must have it because you don't really know which CPU is working with a structure at any given time. These are often manually reference counted objects, which can lead to memory leaking bugs but they are not that hard to find. Is automatic atomic reference counting a contender for kernels? In kernels you want to reduce the increase/decrease of the counts. Therefore the Rust approach using 'clone' is better unless there is some optimizer that can figure it out. Performance is important in kernels, you don't want the kernel to steal useful CPU time that otherwise should go to programs. In general I think that reference counting should be supported in D, not only implicitly but also under the hood with fat pointers. This will make D more attractive to performance applications. Another advantage is the reference counting can use malloc/free directly if needed without any complicated GC layer with associated meta data. Also tracing GC in a kernel is my opinion not desirable. For the reason I previously mentioned, you want to reduce meta data, you want reduce CPU time, you want to reduce fragmentation. Special allocators for structures are often used.
Dec 06 2020
On Sunday, 6 December 2020 at 17:35:19 UTC, IGotD- wrote:Is automatic atomic reference counting a contender for kernels? In kernels you want to reduce the increase/decrease of the counts. Therefore the Rust approach using 'clone' is better unless there is some optimizer that can figure it out. Performance is important in kernels, you don't want the kernel to steal useful CPU time that otherwise should go to programs.I am not sure if kernel authors want autmatic memory management, they tend to want full control and transparency. Maybe something people who write device drivers would consider.In general I think that reference counting should be supported in D, not only implicitly but also under the hood with fat pointers. This will make D more attractive to performance applications. Another advantage is the reference counting can use malloc/free directly if needed without any complicated GC layer with associated meta data.Yes, I would like to see it, just expect that there will be protests when people realize that they have to make ownership explicit.Also tracing GC in a kernel is my opinion not desirable. For the reason I previously mentioned, you want to reduce meta data, you want reduce CPU time, you want to reduce fragmentation. Special allocators for structures are often used.Yes, an ARC solution should support fixed size allocators for types that are frequently allocated to get better speed.
Dec 06 2020
On Sunday, 6 December 2020 at 08:12:58 UTC, Ola Fosheim Grostad wrote:On Sunday, 6 December 2020 at 07:45:17 UTC, Bruce Carneal wrote:They surely do. Looking forward to see D achieve the same performance level as .NET 5 is capable of, beating Google's own gRPC C++ implementation, only Rust implementation beats it. https://www.infoq.com/news/2020/12/aspnet-core-improvement-dotnet-5/ And while on the subject of low level programming in JVM or .NET. https://www.infoq.com/news/2020/12/net-5-runtime-improvements/GCs scan memory, sure. Lots of variations. Not germane. Not a rationale.We need to freeze the threads when collecting stacks/globals.D is employed at multiple "levels". Whatever level you call it, Go and modern JVMs employ low latency GCs in multi-threaded environments. Some people would like to use D at that "level".Yes, but they don't allow low level programming. Go also freeze to sync threads this has a rather profound impact on code generation. They have spent a lot of effort on sync instructions in code gen to lower the latency AFAIK.Many of the performance improvements in the HTTP/2 implementation are related to the reimplementation from "still is this kind of idea that managed languages are not quite up to the task for some of those low-level super performance sensitive components,Rich Lander being one of the main .NET architects, and upcoming Java 16 features, http://openjdk.java.net/jeps/389 (JNI replacement), http://openjdk.java.net/jeps/393 (native memory management). As I already mentioned in another thread, rebooting the language to pull in imaginary crowds will only do more damage than good, while the ones deemed unusable by the same imaginary crowd just keep winning market share, slowly and steady, even if takes yet another couple of years.
Dec 06 2020
On Sunday, 6 December 2020 at 14:44:25 UTC, Paulo Pinto wrote:And while on the subject of low level programming in JVM or .NET. https://www.infoq.com/news/2020/12/net-5-runtime-improvements/Didnt say anything about low level, only simd intrinsics, which isnt really low level? It also stated "When it came to something that is pure CPU raw computation doing nothing but number crunching, in general, you can still eke out better performance if you really focus on "pedal to the metal" with your C/C++ code." So it is more of a Go contender, and Go is not a systems level language... Apples and oranges.As I already mentioned in another thread, rebooting the language to pull in imaginary crowds will only do more damage than good, while the ones deemed unusable by the same imaginary crowd just keep winning market share, slowly and steady, even if takes yet another couple of years.A fair number of people here are in that imaginary crowd. So, I guess it isnt imaginary...
Dec 06 2020
On Sunday, 6 December 2020 at 16:42:00 UTC, Ola Fosheim Grostad wrote:On Sunday, 6 December 2020 at 14:44:25 UTC, Paulo Pinto wrote:So you must make the familiar "ease-of-programming" vs "x% of performance" choice, where 'x' is presumably much smaller than earlier.And while on the subject of low level programming in JVM or .NET. https://www.infoq.com/news/2020/12/net-5-runtime-improvements/Didnt say anything about low level, only simd intrinsics, which isnt really low level? It also stated "When it came to something that is pure CPU raw computation doing nothing but number crunching, in general, you can still eke out better performance if you really focus on "pedal to the metal" with your C/C++ code."So it is more of a Go contender, and Go is not a systems level language... Apples and oranges.D is good for systems level work but that's not all. I use it for projects where, in the past, I'd have split the work between two languages (Python and C/C++). I much prefer working with a single language that spans the problem space. If there is a way to extend D's reach with zero or a near-zero complexity increase as seen by the programmer, I believe we should (as/when resources allow of course).
Dec 06 2020
On Sunday, 6 December 2020 at 17:28:52 UTC, Bruce Carneal wrote:D is good for systems level work but that's not all. I use it for projects where, in the past, I'd have split the work between two languages (Python and C/C++). I much prefer working with a single language that spans the problem space.My impression from reading the forums is that people either use D as a replacement for C/C++ or Python/numpy, so I think your experience covers the essential use case scenario that is dominating current D usage? Any improvements have to improve both dimension, I agree.If there is a way to extend D's reach with zero or a near-zero complexity increase as seen by the programmer, I believe we should (as/when resources allow of course).ARC involves a complexity increase, to some extent. Library authors have to think a bit more principled about when objects should be phased out and destructed, which I think tend to lead to better programs. It would also allow for faster precise collection. So it could be beneficial for all.
Dec 06 2020
On Sunday, 6 December 2020 at 05:29:37 UTC, Ola Fosheim Grostad wrote:On Sunday, 6 December 2020 at 05:16:26 UTC, Bruce Carneal wrote:It has to be either some kind of heavily customisable small GC (i.e. with our resources the GC cannot please everyone), or arc. The GC as it is just hurts the language. Realistically, we probably need some kind of working group or at least serious discussion to really narrow down where to go in the future. The GC as it is now must go, we need borrowing to work with more than just pointers, etc. The issue is that it can't just be done incrementally, it needs to be specified beforehand.How difficult would it be to add a, selectable, low-latency GC to dlang? Is it closer to "we cant get there from here" or "no big deal if you already have the low-latency GC in hand"? I've heard Walter mention performance issues (write barriers IIRC). I'm also interested in the GC-flavor performance trade offs but here I'm just asking about feasibility.The only reasonable option for D is single threaded GC or ARC.
Dec 06 2020
On Sunday, 6 December 2020 at 10:44:39 UTC, Max Haughton wrote:On Sunday, 6 December 2020 at 05:29:37 UTC, Ola Fosheim Grostad wrote: It has to be either some kind of heavily customisable small GC (i.e. with our resources the GC cannot please everyone), or arc. The GC as it is just hurts the language. Realistically, we probably need some kind of working group or at least serious discussion to really narrow down where to go in the future. The GC as it is now must go, we need borrowing to work with more than just pointers, etc. The issue is that it can't just be done incrementally, it needs to be specified beforehand.ARC can be done incrementally, we can do it as a library first and use a modified version existing GC for detecting failed borrows at runtime during testing. But all libraries that use owning pointers need ownership to be made explicit. A static borrow checker an ARC optimizer needs a high level IR though. A lot of work though.
Dec 06 2020
On Sunday, 6 December 2020 at 11:07:50 UTC, Ola Fosheim Grostad wrote:On Sunday, 6 December 2020 at 10:44:39 UTC, Max Haughton wrote:ARC with a library will have overhead unless the compiler/ABI is changed e.g. unique_ptr in C++ has an indirection. The AST effectively is a high-level IR. Not a good one, but good enough. The system Walter has built shows the means are there in the compiler already. As things are at the moment, the annotations we have for pointers like scope go a long way, but the language doesn't deal with things like borrowing structs (and the contents of structs i.e. making a safe vector) properly yet. That is what needs thinking about.On Sunday, 6 December 2020 at 05:29:37 UTC, Ola Fosheim Grostad wrote: It has to be either some kind of heavily customisable small GC (i.e. with our resources the GC cannot please everyone), or arc. The GC as it is just hurts the language. Realistically, we probably need some kind of working group or at least serious discussion to really narrow down where to go in the future. The GC as it is now must go, we need borrowing to work with more than just pointers, etc. The issue is that it can't just be done incrementally, it needs to be specified beforehand.ARC can be done incrementally, we can do it as a library first and use a modified version existing GC for detecting failed borrows at runtime during testing. But all libraries that use owning pointers need ownership to be made explicit. A static borrow checker an ARC optimizer needs a high level IR though. A lot of work though.
Dec 06 2020
On Sunday, 6 December 2020 at 11:27:39 UTC, Max Haughton wrote:ARC with a library will have overhead unless the compiler/ABI is changed e.g. unique_ptr in C++ has an indirection.No, unique doesnt need indirection, neither does ARC, we put the ref count at a negative offset. shared_ptr is a fat pointer with the ref count as a separate object to support existing C libraries, and make weak_ptr easy to implement. But no need for indirection.The AST effectively is a high-level IR. Not a good one, but good enough. The system Walter has built shows the means are there in the compiler already.I think you need a new IR, but it does not have to be used for code gen, it can point back to the ast nodes that represent ARC pointer assignments. One could probably translate the one used in Rust, even.
Dec 06 2020
On Sunday, 6 December 2020 at 11:35:17 UTC, Ola Fosheim Grostad wrote:On Sunday, 6 December 2020 at 11:27:39 UTC, Max Haughton wrote:https://gcc.godbolt.org/z/bnbMeY[...]No, unique doesnt need indirection, neither does ARC, we put the ref count at a negative offset. shared_ptr is a fat pointer with the ref count as a separate object to support existing C libraries, and make weak_ptr easy to implement. But no need for indirection.[...]I think you need a new IR, but it does not have to be used for code gen, it can point back to the ast nodes that represent ARC pointer assignments. One could probably translate the one used in Rust, even.
Dec 06 2020
On Sunday, 6 December 2020 at 14:11:41 UTC, Max Haughton wrote:On Sunday, 6 December 2020 at 11:35:17 UTC, Ola Fosheim Grostad wrote:If you pass something as a parameter then there may or may not be an extra reference involved. Not specific for smart pointers, but ARC optimization should take care of that.On Sunday, 6 December 2020 at 11:27:39 UTC, Max Haughton wrote:https://gcc.godbolt.org/z/bnbMeY[...]No, unique doesnt need indirection, neither does ARC, we put the ref count at a negative offset. shared_ptr is a fat pointer with the ref count as a separate object to support existing C libraries, and make weak_ptr easy to implement. But no need for indirection.[...]I think you need a new IR, but it does not have to be used for code gen, it can point back to the ast nodes that represent ARC pointer assignments. One could probably translate the one used in Rust, even.
Dec 06 2020
On Sunday, 6 December 2020 at 11:07:50 UTC, Ola Fosheim Grostad wrote:ARC can be done incrementally, we can do it as a library first and use a modified version existing GC for detecting failed borrows at runtime during testing. But all libraries that use owning pointers need ownership to be made explicit. A static borrow checker an ARC optimizer needs a high level IR though. A lot of work though.The Rust approach is interesting as it doesn't need an ARC optimizer. Everything is a move so no increase/decrease is done when doing that. Increase is done first when the programmer decides to 'clone' the reference. This inherently becomes optimized without any compiler support. However, this requires that the programmer inserts 'clone' when necessary so it isn't really automatic. I was thinking about how to deal with this in D and the question is if it would be better to be able to control move as default per type basis. This way we can implement Rust style reference counting without intruding too much on the rest of the language. The question is if we want this or if we should go for a fully automated approach where the programmer doesn't need to worry about 'clone'.
Dec 06 2020
On Sunday, 6 December 2020 at 12:58:44 UTC, IGotD- wrote:I was thinking about how to deal with this in D and the question is if it would be better to be able to control move as default per type basis. This way we can implement Rust style reference counting without intruding too much on the rest of the language. The question is if we want this or if we should go for a fully automated approach where the programmer doesn't need to worry about 'clone'.I dont know, but I suspect that people that use D want something more high level than Rust? But I dont use Rust, so...
Dec 06 2020
On 06.12.20 06:16, Bruce Carneal via Digitalmars-d-learn wrote:How difficult would it be to add a, selectable, low-latency GC to dlang? Is it closer to "we cant get there from here" or "no big deal if you already have the low-latency GC in hand"? I've heard Walter mention performance issues (write barriers IIRC). I'm also interested in the GC-flavor performance trade offs but here I'm just asking about feasibility.What our closest competition, Nim, is up to with their mark-and-sweep replacement ORC [1]: ORC is the existing ARC algorithm (first shipped in version 1.2) plus a cycle collector [...] ARC is Nim’s pure reference-counting GC, however, many reference count operations are optimized away: Thanks to move semantics, the construction of a data structure does not involve RC operations. And thanks to “cursor inference”, another innovation of Nim’s ARC implementation, common data structure traversals do not involve RC operations either! [...] Benchmark: Metric/algorithm ORC Mark&Sweep Latency (Avg) 320.49 us 65.31 ms Latency (Max) 6.24 ms 204.79 ms Requests/sec 30963.96 282.69 Transfer/sec 1.48 MB 13.80 KB Max memory 137 MiB 153 MiB That’s right, ORC is over 100 times faster than the M&S GC. The reason is that ORC only touches memory that the mutator touches, too. [...] - uses 2x less memory than classical GCs - can be orders of magnitudes faster in throughput - offers sub-millisecond latencies - suited for (hard) realtime systems - no “stop the world” phase - oblivious to the size of the heap or the used stack space. [1] https://nim-lang.org/blog/2020/12/08/introducing-orc.html
Dec 08 2020
On 06.12.20 06:16, Bruce Carneal via Digitalmars-d-learn wrote:How difficult would it be to add a, selectable, low-latency GC to dlang? Is it closer to "we cant get there from here" or "no big deal if you already have the low-latency GC in hand"? I've heard Walter mention performance issues (write barriers IIRC). I'm also interested in the GC-flavor performance trade offs but here I'm just asking about feasibility.What our closest competition, Nim, is up to with their mark-and-sweep replacement ORC [1]: ORC is the existing ARC algorithm (first shipped in version 1.2) plus a cycle collector [...] ARC is Nim’s pure reference-counting GC, however, many reference count operations are optimized away: Thanks to move semantics, the construction of a data structure does not involve RC operations. And thanks to “cursor inference”, another innovation of Nim’s ARC implementation, common data structure traversals do not involve RC operations either! [...] Benchmark: Metric/algorithm ORC Mark&Sweep Latency (Avg) 320.49 us 65.31 ms Latency (Max) 6.24 ms 204.79 ms Requests/sec 30963.96 282.69 Transfer/sec 1.48 MB 13.80 KB Max memory 137 MiB 153 MiB That’s right, ORC is over 100 times faster than the M&S GC. The reason is that ORC only touches memory that the mutator touches, too. [...] - uses 2x less memory than classical GCs - can be orders of magnitudes faster in throughput - offers sub-millisecond latencies - suited for (hard) realtime systems - no “stop the world” phase - oblivious to the size of the heap or the used stack space. There's also some discussion on /r/programming [2] and hackernews [3], but it hasn't taken off yet. [1] https://nim-lang.org/blog/2020/12/08/introducing-orc.html [2] https://old.reddit.com/r/programming/comments/k95cc5/introducing_orc_nim_nextgen_memory_management/ [3] https://news.ycombinator.com/item?id=25345770
Dec 08 2020