digitalmars.D - A separate GC idea - multiple D GCs
- Chris Katko (42/42) Jan 21 2022 So I was going to ask this related question in my other thread
- rikki cattermole (15/21) Jan 21 2022 Indeed we have thought about this.
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (13/23) Jan 22 2022 Yes, this is better, although you would need to change the
- rikki cattermole (7/10) Jan 22 2022 This isn't what I had in mind.
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (10/12) Jan 22 2022 I don't quite see how you plan to make this work without tracking
- rikki cattermole (20/20) Jan 22 2022 That's not the direction I'm thinking. Its not about freeing memory. Its...
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (21/32) Jan 22 2022 I believe the number of roots is irrelevant, what you need is to
- rikki cattermole (10/23) Jan 22 2022 Yeah those are roots. They still have to be added into the global heap
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/6) Jan 22 2022 Well, but I would never use it. Forking is crazy expensive.
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (21/27) Jan 22 2022 Here are some downsides of a forking collector:
- rikki cattermole (4/10) Jan 22 2022 Agreed. There is no one size fits all solution.
- Elronnd (8/17) Jan 22 2022 Probably likely to evict some stuff, but I don't see why you
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (14/22) Jan 22 2022 If you change the TLB, then affected address ranges should in
- Elronnd (3/6) Jan 22 2022 I assume you would only lose TLB, not cache.
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (6/12) Jan 23 2022 I wouldn't make any assumptions, what I get from a quick Google
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (5/11) Jan 23 2022 I wouldn't make any assumptions, what I get from a quick Google
- max haughton (4/16) Jan 23 2022 L1 cache is often virtually addressed but physically tagged so
- IGotD- (15/22) Jan 22 2022 Does C# have thread pinned GC memory? I haven't seen anything
- Adam Ruppe (13/20) Jan 21 2022 The big problem is that data isn't thread local (including TLS);
- Tejas (3/10) Jan 21 2022 Is this intended behaviour? Or was it accidental? Is it atleast
- H. S. Teoh (28/40) Jan 21 2022 Immutable data is shared across threads by default. So any immutable
- Adam D Ruppe (3/6) Jan 21 2022 No.
- =?UTF-8?Q?Ali_=c3=87ehreli?= (13/15) Jan 21 2022 I ended up with that design by accident: My D library had to spawn
- Walter Bright (3/8) Jan 21 2022 Yes. The trouble is what happens when a pointer in one pool is cast to a...
- H. S. Teoh (13/22) Jan 21 2022 It almost makes one want to tag pointer types at compile-time as
- Tejas (3/17) Jan 21 2022 Isn't going global from thread local done via `cast(shared)`
- Sebastiaan Koppe (10/19) Jan 22 2022 Instead of segregating GC by execution contexts like threads or
- rikki cattermole (4/6) Jan 22 2022 For my proposal, I may call it a fiber, but in reality it can just as
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/8) Jan 22 2022 The term «task» is quite generic and covers fibers as well as
So I was going to ask this related question in my other thread but I now see it's hit over 30 replies (!) and the discussion is interesting but a separate topic. So a related question: Has anyone ever thought about "thread-local garbage collection" or some sort of "multiple pool [same process/thread] garbage collection"? The idea here is, each thread [or thread collection] would have its own garbage collector, and, be the only thread that pauses during a collection event. Basically, you cordon off your memory use like you would with say a memory pool, so that when a GC freezes it only freezes that section. Smaller, independent sections means less freeze time and the rest of the threads keep running on multiple core CPUs. There are obviously some issues: 1 - Is the D GC designed to allow something like this? Can it be prevented from walking into areas it shouldn't? I know you can make malloc non-system memory it "shouldn't" touch if you never make pointers to it? 2 - It will definitely be more complex, and D can be too complex and finnicky already. (SEGFAULT TIME.) So patterns to prevent that will be needed. "Am I reading in my thread, or thread 37?" 3 - Any boundary where threads "touch" will be either a nightmare or at least need some sort of mechanism to cross and synchronize across said boundary. And a frozen thread, synchronized, will freeze any reads by non-frozen threads. [Though one might be able to schedule garbage collections only when massive reads/writes aren't currently needed.] I think the simple, easiest way to try this would be to just spawn multiple processes [each having their own D GC collector] and somehow share memory between them. But I have no idea if it's easy to work around the inherent "multiple process = context switch" overhead making it actually slower. As for the "why?". I'm not sure if there are huge benefits, mild benefits, benefits in only niche scenarios (like games, which is the kind I play with). But I'm just curious about the prospect. Because while an "iterative GC" requires... an entire new GC. Splitting off multiple existing D garbage collectors into their own fields could maybe work. [And once again, without benchmarks a single D GC may be more than fast for my needs.] But I'm curious and like to know my options, suggestions, and people's general thoughts on it. Thanks, have a great day!
Jan 21 2022
On 22/01/2022 2:56 AM, Chris Katko wrote:So a related question: Has anyone ever thought about "thread-local garbage collection" or some sort of "multiple pool [same process/thread] garbage collection"? The idea here is, each thread [or thread collection] would have its own garbage collector, and, be the only thread that pauses during a collection event.Indeed we have thought about this. What I want is a fiber aware GC and that implicitly means thread-local too. But I'm not sure it would be any use with the existing GC to have the hooks, it would need to be properly designed to take advantage of it.Because while an "iterative GC" requires... an entire new GC.The existing GC has absolutely horrible code. I tried a while back to get it to support snapshotting (Windows specific concurrency for GC's) and I couldn't find where it even did the scanning for pointers... yeah. https://github.com/dlang/druntime/blob/master/src/core/internal/gc/impl/conservative/gc.d After a quick look it does look like it has been improved somewhat with more comments since then. What we need is a full reimplementation of the GC that is easy to dig into. After that, forking, precise, generational should all be pretty straight forward to implement.
Jan 21 2022
On Friday, 21 January 2022 at 14:09:18 UTC, rikki cattermole wrote:On 22/01/2022 2:56 AM, Chris Katko wrote:Yes, this is better, although you would need to change the language semantics so that non-shared means local to a fiber/actor as the fiber/actor can move from one thread to another. You also need a per-thread runtime that can collect from sleeping fibers/actors and in the worst case signal other thread runtimes to collect theirs. Otherwise you would run out of memory when memory is available... In general I'd say you would want to have support for many different memory managers in the same executable as different fibers/actors may have different memory management needs.So a related question: Has anyone ever thought about "thread-local garbage collection" or some sort of "multiple pool [same process/thread] garbage collection"? The idea here is, each thread [or thread collection] would have its own garbage collector, and, be the only thread that pauses during a collection event.Indeed we have thought about this. What I want is a fiber aware GC and that implicitly means thread-local too.
Jan 22 2022
On 22/01/2022 10:24 PM, Ola Fosheim Grøstad wrote:Yes, this is better, although you would need to change the language semantics so that non-shared means local to a fiber/actor as the fiber/actor can move from one thread to another.This isn't what I had in mind. By telling the GC about fibers, it should be able to take advantage of this information and limit the time it takes to do stop the world scanning. If it knows memory is live, it doesn't need to scan for it. If it knows that memory isn't referenced in that fiber anymore, then it only needs to confirm that it isn't anywhere else.
Jan 22 2022
On Saturday, 22 January 2022 at 10:05:07 UTC, rikki cattermole wrote:If it knows that memory isn't referenced in that fiber anymore, then it only needs to confirm that it isn't anywhere else.I don't quite see how you plan to make this work without tracking dirty objects, which has a performance/storage penalty. Fiber1 has memory chunk A Fiber2 has memory chunks A, B You confirm that sleeping Fiber1 does not have access to B Fiber2 sets A.next = B At this point, you assume wrongly that Fiber1 does not have access to B?
Jan 22 2022
That's not the direction I'm thinking. Its not about freeing memory. Its about deferment. The best way to make your code faster is to simply do less work. So to make global memory scanning faster, we either have to decrease the number of roots (likely to end with segfaults) or to decrease the memory ranges to scan for. And since we have a context to associate with memory that is allocated and with it roots, we have a pretty good idea of if that memory is still alive or not during the execution of said context. Think about the context of a web server/service. You have some global heap memory, router, plugins, adminy type stuff, that sort of thing. Most of that sort of memory is either reused or sticks around till the end of the process. But then there is request specific memory that gets allocated then thrown away. If you can defer needing to add those memory ranges into the global heap, that'll speed up scanning drastically as that is where most of your free-able memory will originate from. To top it off, when you have a concurrent GC design like forking or snapshotting, you can scan for those dead fibers memory ranges and it won't require stopping the world.
Jan 22 2022
On Saturday, 22 January 2022 at 10:38:36 UTC, rikki cattermole wrote:So to make global memory scanning faster, we either have to decrease the number of roots (likely to end with segfaults) or to decrease the memory ranges to scan for.I believe the number of roots is irrelevant, what you need is to have heap separation where you ensure through the type system type system (or programmer) is responsible for the pointed-to object being live. So, if you let a fiber have it's own heap of this kind, then you can reduce the scanning? But you need to either improve the type system or put the burden on the programmer in C++-style. Either way, I believe distinguishing between ownership and borrows can improve on GC performance, not only unique_ptr style ownership.But then there is request specific memory that gets allocated then thrown away. If you can defer needing to add those memory ranges into the global heap, that'll speed up scanning drastically as that is where most of your free-able memory will originate from.But you need to scan those memory ranges if they can contain pointers that can reach GC-memory?To top it off, when you have a concurrent GC design like forking or snapshotting, you can scan for those dead fibers memory ranges and it won't require stopping the world.You want to require taking a snapshot-copy of the pointer-containing heap? That will make the collector much less useful, I think.
Jan 22 2022
On 22/01/2022 11:59 PM, Ola Fosheim Grøstad wrote:Yeah those are roots. They still have to be added into the global heap to be scanned. The goal is to decrease the number of memory ranges that need to be scanned for in a full scan. Cost = Roots * Ranges Get Ranges down (until it actually does need to be scanned for), lowers the cost, that's the goal by what I am suggesting!But then there is request specific memory that gets allocated then thrown away. If you can defer needing to add those memory ranges into the global heap, that'll speed up scanning drastically as that is where most of your free-able memory will originate from.But you need to scan those memory ranges if they can contain pointers that can reach GC-memory?We already have forking, and that is super useful for getting user threads to not stop. We are only missing snapshotting for Windows now.To top it off, when you have a concurrent GC design like forking or snapshotting, you can scan for those dead fibers memory ranges and it won't require stopping the world.You want to require taking a snapshot-copy of the pointer-containing heap? That will make the collector much less useful, I think.
Jan 22 2022
On Saturday, 22 January 2022 at 11:06:24 UTC, rikki cattermole wrote:We already have forking, and that is super useful for getting user threads to not stop.Well, but I would never use it. Forking is crazy expensive. Messes with TLB and therefore caches IIRC.
Jan 22 2022
On Saturday, 22 January 2022 at 11:27:39 UTC, Ola Fosheim Grøstad wrote:On Saturday, 22 January 2022 at 11:06:24 UTC, rikki cattermole wrote:Here are some downsides of a forking collector: 1. messes with TLB, wipes all caches completely (AFAIK). 2. will hog extra threads, thus if your program is using all cores, you will see a penalty 3. requires a reduced activity in pointer mutation in the main program, so the forking collector has to saturate the databus in order to complete quickly, which is bad for the main process 4. requires carefulness in configuring OS resource handling 5. if you actually are out of memory, or close to it, then there is no way for you to fork, so it will fail and the process will instead be killed 6. makes it more difficult to coordinate with real time threads (throttling/backpressure) 7. not really portable, platform dependent A forking collector is just not suitable for system level programming, it is very much a solution for a high level programming language running on hardware with lots of headroom. If you are going high level, you might as well introduce write barriers for pointer mutation.We already have forking, and that is super useful for getting user threads to not stop.Well, but I would never use it. Forking is crazy expensive. Messes with TLB and therefore caches IIRC.
Jan 22 2022
On 23/01/2022 1:39 AM, Ola Fosheim Grøstad wrote:A forking collector is just not suitable for system level programming, it is very much a solution for a high level programming language running on hardware with lots of headroom. If you are going high level, you might as well introduce write barriers for pointer mutation.Agreed. There is no one size fits all solution. Having more GC's including ones that require write barriers as an option are great things to have!
Jan 22 2022
On Saturday, 22 January 2022 at 12:39:44 UTC, Ola Fosheim Grøstad wrote:Here are some downsides of a forking collector: 1. messes with TLB, wipes all caches completely (AFAIK).Probably likely to evict some stuff, but I don't see why you would lose everything.2. will hog extra threads, thus if your program is using all cores, you will see a penaltyIt probably isn't using all threads, though.5. if you actually are out of memory, or close to it, then there is no way for you to fork, so it will fail and the process will instead be killedYou can fall back to regular gc if fork fails (and I think the fork gc does this).If you are going high level, you might as well introduce write barriers for pointer mutation.I agree with your other critiques, and I agree with this.
Jan 22 2022
On Saturday, 22 January 2022 at 16:45:02 UTC, Elronnd wrote:On Saturday, 22 January 2022 at 12:39:44 UTC, Ola Fosheim Grøstad wrote:If you change the TLB, then affected address ranges should in general be flushed although maybe this is too pessimistic in the case of a fork. I don't know the details of what different CPU/MMU hardware implementations require and how various OSes deal with this. But both the fork and the COW page copying that will occur when the process writes to memory pages after the fork hurt.Here are some downsides of a forking collector: 1. messes with TLB, wipes all caches completely (AFAIK).Probably likely to evict some stuff, but I don't see why you would lose everything.You can fall back to regular gc if fork fails (and I think the fork gc does this).Good point, but if you succeed with the fork in a low memory situation then you are in risk of being killed by Linux OOM Killer. Maybe only the GC collector will be killed since it is a child process, or how does this work? So what happens then, will D detect this failure and switch to a cheaper GC collection process?
Jan 22 2022
On Saturday, 22 January 2022 at 19:43:33 UTC, Ola Fosheim Grøstad wrote:If you change the TLB, then affected address ranges should in general be flushed although maybe this is too pessimistic in the case of a fork.I assume you would only lose TLB, not cache.
Jan 22 2022
On Saturday, 22 January 2022 at 21:45:36 UTC, Elronnd wrote:On Saturday, 22 January 2022 at 19:43:33 UTC, Ola Fosheim Grøstad wrote:I wouldn't make any assumptions, what I get from a quick Google search fits with what I've read on this topic before: on ARM a TLB flush implies a cache flush, and the Linux documentation https://tldp.org/LDP/khg/HyperNews/get/memory/flush.html describes the pattern: flush cache, modify, flush TLB.If you change the TLB, then affected address ranges should in general be flushed although maybe this is too pessimistic in the case of a fork.I assume you would only lose TLB, not cache.
Jan 23 2022
On Saturday, 22 January 2022 at 21:45:36 UTC, Elronnd wrote:On Saturday, 22 January 2022 at 19:43:33 UTC, Ola Fosheim Grøstad wrote:I wouldn't make any assumptions, what I get from a quick Google search fits with what I've read on this topic before: on ARM a TLB flush implies a cache flush, and [tldp.org](https://tldp.org/LDP/khg/HyperNews/get/memory/flush.html) describes the pattern for Linux as: flush cache, modify, flush TLB.If you change the TLB, then affected address ranges should in general be flushed although maybe this is too pessimistic in the case of a fork.I assume you would only lose TLB, not cache.
Jan 23 2022
On Sunday, 23 January 2022 at 12:30:03 UTC, Ola Fosheim Grøstad wrote:On Saturday, 22 January 2022 at 21:45:36 UTC, Elronnd wrote:L1 cache is often virtually addressed but physically tagged so that makes sense.On Saturday, 22 January 2022 at 19:43:33 UTC, Ola Fosheim Grøstad wrote:I wouldn't make any assumptions, what I get from a quick Google search fits with what I've read on this topic before: on ARM a TLB flush implies a cache flush, and [tldp.org](https://tldp.org/LDP/khg/HyperNews/get/memory/flush.html) describes the pattern for Linux as: flush cache, modify, flush TLB.If you change the TLB, then affected address ranges should in general be flushed although maybe this is too pessimistic in the case of a fork.I assume you would only lose TLB, not cache.
Jan 23 2022
On Saturday, 22 January 2022 at 10:05:07 UTC, rikki cattermole wrote:This isn't what I had in mind. By telling the GC about fibers, it should be able to take advantage of this information and limit the time it takes to do stop the world scanning. If it knows memory is live, it doesn't need to scan for it. If it knows that memory isn't referenced in that fiber anymore, then it only needs to confirm that it isn't anywhere else.that you need to "move" the memory to other threads in that language but I don't really know what is going on under the hood there. one of the reasons D is nice to use. Manually moving memory around will have an effect on how the language looks and ease of use. Another thing is that thread pinned memory doesn't work well with thread pools as any thread in a pool might tamper with the memory. In general, tracing GC doesn't scale well with large amount of memory. I would rather go with something similar to ORC in Nim in order to reduce the tracing.
Jan 22 2022
On Friday, 21 January 2022 at 13:56:09 UTC, Chris Katko wrote:So a related question: Has anyone ever thought about "thread-local garbage collection"The big problem is that data isn't thread local (including TLS); nothing stops one thread from pointing into another thread. So any GC that depends on a barrier there is liable to look buggy. You can do it for special cases with fork like you said, or with unregistering threads like Guillaume doesI think the simple, easiest way to try this would be to just spawn multiple processes [each having their own D GC collector] and somehow share memory between them. But I have no idea if it's easy to work around the inherent "multiple process = context switch" overhead making it actually slower.I actually do exactly this with my web server, but it is easy there since web requests are supposed to be independent anyway. re context switches btw, processes and threads both have them. they aren't that different on the low level, it is just a matter of how much of the memory space is shared. default shared = thread, default unshared = process.
Jan 21 2022
On Friday, 21 January 2022 at 14:37:04 UTC, Adam Ruppe wrote:On Friday, 21 January 2022 at 13:56:09 UTC, Chris Katko wrote:Is this intended behaviour? Or was it accidental? Is it atleast guaranteed that only one thread can hold a muta le reference?So a related question: Has anyone ever thought about "thread-local garbage collection"The big problem is that data isn't thread local (including TLS); nothing stops one thread from pointing into another thread. So any GC that depends on a barrier there is liable to look buggy.
Jan 21 2022
On Fri, Jan 21, 2022 at 03:16:32PM +0000, Tejas via Digitalmars-d wrote:On Friday, 21 January 2022 at 14:37:04 UTC, Adam Ruppe wrote:Immutable data is shared across threads by default. So any immutable data would need to be collected by a global GC. However, you don't always know if some data is going to end up being immutable when you allocate, e.g.: int[] createSomeData() pure { return [ 1, 2, 3 ]; // mutable when we allocate } void main() { // Implicitly convert from mutable to immutable because // createSomeData() is pure and the data is unique. immutable int[] constants = createSomeData(); shareMyData(constants); // Now who should collect, per-thread GC or global GC? constants = null; } So assume we allocate the initial array on the per-thread heap. Then it gets implicitly cast to immutable because it was unique, and shared with another thread. Now who should collect, the allocating thread's GC or the GC of the thread the data was shared with? Neither are correct: you need a global GC. At the very least, you need to synchronize across individual threads' GCs, which at least partially negates the benefit of a per-thread GC. This is just one example of several that show why it's a hard problem in the current language. T -- One Word to write them all, One Access to find them, One Excel to count them all, And thus to Windows bind them. -- Mike ChampionOn Friday, 21 January 2022 at 13:56:09 UTC, Chris Katko wrote:Is this intended behaviour? Or was it accidental? Is it atleast guaranteed that only one thread can hold a muta le reference?So a related question: Has anyone ever thought about "thread-local garbage collection"The big problem is that data isn't thread local (including TLS); nothing stops one thread from pointing into another thread. So any GC that depends on a barrier there is liable to look buggy.
Jan 21 2022
On Friday, 21 January 2022 at 15:16:32 UTC, Tejas wrote:Is this intended behaviour?Yes.Is it atleast guaranteed that only one thread can hold a muta le reference?No.
Jan 21 2022
On 1/21/22 05:56, Chris Katko wrote:some sort of "multiple pool [same process/thread] garbage collection"?I ended up with that design by accident: My D library had to spawn multiple D processes instead of multiple threads. This was a workaround to my inability to execute initialization functions of D shared libraries that were dynamically loaded by foreign runtimes (Python and C++ were in the complex picture). In the end, I realized that my library was using multiple D runtimes on those multiple D processes. :) I've never gotten to profiling whether my necessary inter-process communication was hurting performance. Even if it did, the now-unstopped worlds might be better in the end. I even thought about making a DConf presentation about the findings but it never happened. :) Ali
Jan 21 2022
On 1/21/2022 5:56 AM, Chris Katko wrote:So a related question: Has anyone ever thought about "thread-local garbage collection" or some sort of "multiple pool [same process/thread] garbage collection"? The idea here is, each thread [or thread collection] would have its own garbage collector, and, be the only thread that pauses during a collection event.Yes. The trouble is what happens when a pointer in one pool is cast to a pointer in another pool.
Jan 21 2022
On Fri, Jan 21, 2022 at 02:43:31PM -0800, Walter Bright via Digitalmars-d wrote:On 1/21/2022 5:56 AM, Chris Katko wrote:It almost makes one want to tag pointer types at compile-time as thread-local or global. Casting from thread-local to global would emit a call to some druntime hook to note the transfer (which, presumably, should only occur rarely). Stuff with only global references will be collected by the global GC, which can be scheduled to run less frequently (or disabled if you never do such casts). But yeah, this is a slippery slope on the slide down towards Rust... :-P It seems we just can't get any farther from where we are without starting to need managed pointer types. T -- Who told you to swim in Crocodile Lake without life insurance??So a related question: Has anyone ever thought about "thread-local garbage collection" or some sort of "multiple pool [same process/thread] garbage collection"? The idea here is, each thread [or thread collection] would have its own garbage collector, and, be the only thread that pauses during a collection event.Yes. The trouble is what happens when a pointer in one pool is cast to a pointer in another pool.
Jan 21 2022
On Friday, 21 January 2022 at 22:56:46 UTC, H. S. Teoh wrote:On Fri, Jan 21, 2022 at 02:43:31PM -0800, Walter Bright via Digitalmars-d wrote:Isn't going global from thread local done via `cast(shared)` though? Is that not enough to notify the compiler?[...]It almost makes one want to tag pointer types at compile-time as thread-local or global. Casting from thread-local to global would emit a call to some druntime hook to note the transfer (which, presumably, should only occur rarely). Stuff with only global references will be collected by the global GC, which can be scheduled to run less frequently (or disabled if you never do such casts). But yeah, this is a slippery slope on the slide down towards Rust... :-P It seems we just can't get any farther from where we are without starting to need managed pointer types. T
Jan 21 2022
On Friday, 21 January 2022 at 13:56:09 UTC, Chris Katko wrote:So I was going to ask this related question in my other thread but I now see it's hit over 30 replies (!) and the discussion is interesting but a separate topic. So a related question: Has anyone ever thought about "thread-local garbage collection" or some sort of "multiple pool [same process/thread] garbage collection"? The idea here is, each thread [or thread collection] would have its own garbage collector, and, be the only thread that pauses during a collection event.Instead of segregating GC by execution contexts like threads or fibers, I think it makes more sense to separate it by task instead. When making the most of multiple cores you try to limit synchronizations to tasks boundaries anyway. So a task itself basically runs isolated. Any memory it burns through can basically be dropped when the task is done. Of course the devil is in the details, but I think it is more promising than segregating by execution context.
Jan 22 2022
On 23/01/2022 4:14 AM, Sebastiaan Koppe wrote:Instead of segregating GC by execution contexts like threads or fibers, I think it makes more sense to separate it by task instead.For my proposal, I may call it a fiber, but in reality it can just as easily be a task. The only difference is one has a dedicated stack, the other does not.
Jan 22 2022
On Saturday, 22 January 2022 at 15:32:27 UTC, rikki cattermole wrote:For my proposal, I may call it a fiber, but in reality it can just as easily be a task. The only difference is one has a dedicated stack, the other does not.The term «task» is quite generic and covers fibers as well as other computational units, with or without a stack.
Jan 22 2022