digitalmars.D - On heap segregation, GC optimization and nogc relaxing
- deadalnix (183/183) Nov 11 2014 Hi all,
- Orvid King (33/221) Nov 11 2014 I think a combination of the C++'s standard library's approach
- Rikki Cattermole (60/229) Nov 11 2014 Humm.
- deadalnix (6/7) Nov 11 2014 yes and no. The ideas is similar, but it is not doable at library
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (11/19) Nov 12 2014 I'm not sure. A library implementation may be feasible. For
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (14/35) Nov 13 2014 I now remember that I played around with a related idea when
- deadalnix (7/16) Nov 13 2014 If you have a owned type, you don't need a builtin RC type. As
- Walter Bright (3/4) Nov 11 2014 Thanks for an excellent summary of the problem. I can't just read your s...
- deadalnix (5/9) Nov 11 2014 That is quite difficult to explain with drawing. Maybe I can
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (71/101) Nov 12 2014 "ownership of data" is one possible solution, but not the problem.
- deadalnix (21/27) Nov 12 2014 I'm sorry to be blunt, but there is nothing actionable in your
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (14/27) Nov 12 2014 My point is that you are making too many assumptions about both
- Paulo Pinto (9/13) Nov 12 2014 Given the support on Haskell, Clojure and C++ I am not sure if
- ponce (12/25) Nov 12 2014 I actually tested Haswell HLE and was underwhelmed (not the full
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (7/9) Nov 12 2014 STM = software based transactional memory (without hardware
- ponce (8/14) Nov 12 2014 I was meaning HTM instead, good catch.
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (27/36) Nov 12 2014 Yes, Intel style HTM is only an optimization for high contention
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (11/66) Nov 12 2014 All this is unfortunately only true if there are no references
- deadalnix (4/7) Nov 12 2014 Yes, that is exactly why I'm listing the case where these can be
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (9/18) Nov 13 2014 Hmm... I can't find that in what you wrote. To clarify: I'm
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (7/10) Nov 13 2014 I sometimes wonder if the very pragmatic and brutal option of
- deadalnix (9/29) Nov 13 2014 You need a set of root from the TL heap. The trick being to
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (12/20) Nov 14 2014 Consider this:
- deadalnix (6/31) Nov 14 2014 That is a well covered subject and told you what to google for as
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (18/24) Nov 14 2014 It would help if you post links to articles.
- deadalnix (35/35) Nov 14 2014 ML is interesting because it emphasis immutability. For
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (10/19) Nov 15 2014 Thanks for the link! I agree ML has some interesting work done
- deadalnix (5/14) Nov 15 2014 The small heap do not apply for us. We can't reproduce that part
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (25/42) Nov 18 2014 Maybe, but immutable in ML means that two different immutable
- deadalnix (26/28) Nov 18 2014 What you need is for each thread to provide you a list of roots.
- Guillaume Chatelet (4/4) Nov 21 2014 What would be the next step to keep this effort going forward ?
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (7/12) Nov 16 2014 Ok, you caught me there :-P
- Paulo Pinto (3/16) Nov 17 2014 If you mean OCaml, the runtime is being made multi-thread.
- Dmitry Olshansky (25/104) Nov 12 2014 Aye.
- deadalnix (24/41) Nov 12 2014 Yes, the unsafe road must always be open, we are a system
- Dmitry Olshansky (17/62) Nov 14 2014 Sorry, forgot to reply.
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (10/22) Nov 14 2014 It needs to be `owned(Exception)`, otherwise, how could the
- deadalnix (3/11) Nov 14 2014 That is a very good question. I do think this should show the
- Walter Bright (3/11) Nov 13 2014 If that wrapper is automatically generated by the compiler, so the user ...
- deadalnix (5/21) Nov 13 2014 Yes, that is the intention. It means that my proposal does
- Walter Bright (8/8) Nov 17 2014 Nice article.
- deadalnix (60/71) Nov 17 2014 Ok, I'm gonna respond in reverse order, so I can use one answer
- Walter Bright (5/8) Nov 17 2014 There was a lot of skepticism initially when I proposed that ref be a st...
- deadalnix (3/13) Nov 17 2014 ref only make sense at interface boundary. That is not the same
- Dicebot (6/6) Dec 04 2014 Finally got a look at your proposal. While I do agree with many
Hi all, I want to get back on the subject of ownership, lifetime and propose some solution, but before, propose to state the problem in a way that haven't seen before (even if I have no doubt some have came to the same conclusion in the past). The problem at hand is double: memory management and thread safety. Number one has been a hot topic for ages, and number 2 has become very over the past years, to the widespreading of multicores CPU. The problem at hand here is ownership of data. There are 3 roads you can go about it: - immutability and GC. Effectively, these 2 technique allow you to get rid of ownership. There are advantages and drawbacks i'm going to discuss later. - Being unsafe and rely on convention. This is the C++ road (and a possible road in D). It allow to implement almost any wanted scheme, but come at great cost for the developer. - Annotations. This is the Rust road. It also come a great cost for the developer, as some schemes may be non trivial to express granted the type system, but, contrary to the C++ road, is safe. These approach all have some very nice things going on for them, but also some killer scenarios. Immutability+GC allow to have safety while keeping interfaces simple. That is of great value. It also come with some nice goodies, in the sense that is it easy and safe to shared data without bookkeeping, allowing one to fit more in cache, and reduce the amount of garbage created. Most text processing apps fall into this category and this is why D is that good at them. Another big goodies is that many lock free algorithm become possible. Once you remove the need for bookkeeping of ownership many operations can be implemented in an atomic manner. Additionally, it is possible to implement various GC optimization on immutable heap, which make the GC generally more efficient. But the cost is also real. For some use case, this mean having a large amount of garbage generated (Carmack wrote a piece on haskell were he mention the disastrous effect that having a framebuffer immutable would have: you'd have to clone it everytime you draw in it, which is a no go). GC also tend to cause unpredictable runtime characteristics, which programs with real time constraint can have hard time to deal with. Relying on convention has the advantage that any scheme can be implemented without constraint, while keeping interface simple. The obvious drawback is that it is time consuming and error prone. It also make a lot of things unclear, and dev choose the better safe than sorry road. That mean excessive copying to make sure one own the data, which is wasteful (in term of work for the copy itself, garbage generation and cache pressure). If this must be an option locally for system code, it doesn't seems like this is the right option at program scale and we do it in C++ simply because we have to. Finally, annotations are a great way to combine safety and speed, but generally come at a great cost when implenting uncommon ownership strategies where you ends up having to express complex lifetime and ownership relations. Ideally, we want to map with what the hardware does. So what does the hardware do ? Multicore CPU have various cores, each of them having layers of cache. Cache is organized in cache line and each cache line can be in various modes. Actual system are quite complex and deal with problems we are not very interesting here (like writeback) but the general idea is that every cache line is owned with different modes. Either the cache line is owned by a single core and can be written to, or the cache line shared by several cores, each of them having a local copy of the line, but none of them can write to. There is an internal bus where cores can exchange cache line with each other and messages to acquire cache line in read or read/write mode. That mean CPU are good at thread local read/write, shared immutable and transfer of ownership from one core to the other. They are bad at shared writable data (as effectively, the cache line will have to bounce back and forth between cores, and all memory access will need to be serialized instead of performed out of order). In that world, D has a bizaro position were it use a combination of annotations (immutable, shared) and GC. Ultimately, this is a good solution. Using annotation for common cases, fallback on GC/unsafe code when these annotations fall short. Before going into why it is fallign short, a digression on GC and the benefits of segregating the heap. In D, the heap is almost segregated in 3 groups: thread local, shared and immutable. These group are very interesting for the GC: - Thread local heap can be collected while disturbing only one thread. It should be possible to use different strategy in different threads. - Immutable heap can be collected 100% concurrently without any synchronization with the program. - Shared heap is the only one that require disturbing the whole program, but as a matter of good practice, this heap should be small anyway. Various ML family languages (like OCaml) have adopted segregated heap strategy and get great benefice out of it. For instance, OCaml's GC is known to outperform Java's in most scenarios. We are sitting on a huge GC goldmine here, but 3 things prevent us to exploit it: - Exceptions. They can bubble from one thread to the other and create implicit sharing. - Uniqueness (as it is defined now) as it allow for unique object to be merged with any heap. - message passing. Ownership transfert is not possible and so unsafe casting ensue. * It has to be noted that delegate allow as well for this kind of stunt, but this is recognized as a bug by now and hopefully it is gonna be fixed. D has a type qualifier system for which we pay a big price. Getting everything const correct is difficult. We'd want to get the most bang for the buck. One of the bang we are not far to be able to get is segregating the heap. That mean shitty GC and unsafe code. Let's present a concrete exemple using ownership: pure Object foo() { ... } immutable o = foo(); This is valid code. However, foo can do arbitrary manipulation to come up with the object. These include various allocations. These allocation are mutable into foo, which makes it impossible to allocate them on the immutable heap (as a GC relying on this immutability could mess up things pretty bad). They also cannot be allocated on the TL heap as once promoted to immutable, the data become shared as well. On the other hand, ownership means that the compiler can know when things go out of scope and free them explicitly. Which is a plus as generating less garbage is always a way to improve garbage collection. The most efficient work there is is the one that do not need to be done. I'd argue for the introduction of a basic ownership system. Something much simpler than rust's, that do not cover all uses cases. But the good thing is that we can fallback on GC or unsafe code when the system show its limits. That mean we rely less on the GC, while being able to provide a better GC. We already pay a cost at interface with type qualifier, let's make the best of it ! I'm proposing to introduce a new type qualifier for owned data. Now it means that throw statement expect a owned(Throwable), that pure function that currently return an implicitly unique object will return owned(Object) and that message passing will accept to pass around owned stuff. The GC heap can be segregated into island. We currently have 3 types of islands : Thread local, shared and immutable. These are builtin island with special characteristics in the language. The new qualifier introduce a new type of island, the owned island. owned island can only refers to other owned island and to immutable. they can be merged in any other island at any time (that is why they can't refers to TL or shared). owned(T) can be passed around as function parameter or returned, or stored as fields. When doing so they are consumed. When an owned is not consumed and goes out of scope, the whole island is freed. That means that owned(T) can implicitly decay into T, immutable(T), shared(T) at any time. When doing so, a call to the runtime is done to merge the owned island to the corresponding island. It is passed around as owned, then the ownership is transferred and all local references to the island are invalidated (using them is an error). On an implementation level, a call to a pure function that return an owned could look like this : { IslandID __saved = gc_switch_new_island(); scope(exit) gc_restore_island(__saved); call_pure_function(); } This allow us to rely much less on the GC and allow for a better GC implementation. nogc . Remember ? It was in the title. What does a nogc function look like ? a no gc function o not produce any garbage or trigger the collection cycle. there is no reason per se to prevent the nogc code to allocate on the GC as long as you know it won't produce garbage. That mean the only operation you need to ban are the one that merge the owned things into TL, shared or immutable heap. This solves the problem of the nogc + Exception. As Exception are isolated, they can be allocated, throw and catched into nogc code without generating garbage. They can safely bubble out of the nogc section of the code and still be safe. The same way, it open the door for a LOT of code that is not nogc to be. If the code allocate memory in an owned island and return it, then it is now up to the caller to decide whether is want's it garbage collected or keep it as owned (and/or make it reference counted for instance). The solution of passing a policy at compile for allocation is close to what C++'s stdlib is doing, and even if the proposed approach by Andrei is better, I don't think this is a good one. The proposed approach allow for a lot of code to be marked as nogc and allow for the caller to decide. That is ultimately what we want libraries to look like.
Nov 11 2014
On Wednesday, 12 November 2014 at 02:34:55 UTC, deadalnix wrote:Hi all, I want to get back on the subject of ownership, lifetime and propose some solution, but before, propose to state the problem in a way that haven't seen before (even if I have no doubt some have came to the same conclusion in the past). The problem at hand is double: memory management and thread safety. Number one has been a hot topic for ages, and number 2 has become very over the past years, to the widespreading of multicores CPU. The problem at hand here is ownership of data. There are 3 roads you can go about it: - immutability and GC. Effectively, these 2 technique allow you to get rid of ownership. There are advantages and drawbacks i'm going to discuss later. - Being unsafe and rely on convention. This is the C++ road (and a possible road in D). It allow to implement almost any wanted scheme, but come at great cost for the developer. - Annotations. This is the Rust road. It also come a great cost for the developer, as some schemes may be non trivial to express granted the type system, but, contrary to the C++ road, is safe. These approach all have some very nice things going on for them, but also some killer scenarios. Immutability+GC allow to have safety while keeping interfaces simple. That is of great value. It also come with some nice goodies, in the sense that is it easy and safe to shared data without bookkeeping, allowing one to fit more in cache, and reduce the amount of garbage created. Most text processing apps fall into this category and this is why D is that good at them. Another big goodies is that many lock free algorithm become possible. Once you remove the need for bookkeeping of ownership many operations can be implemented in an atomic manner. Additionally, it is possible to implement various GC optimization on immutable heap, which make the GC generally more efficient. But the cost is also real. For some use case, this mean having a large amount of garbage generated (Carmack wrote a piece on haskell were he mention the disastrous effect that having a framebuffer immutable would have: you'd have to clone it everytime you draw in it, which is a no go). GC also tend to cause unpredictable runtime characteristics, which programs with real time constraint can have hard time to deal with. Relying on convention has the advantage that any scheme can be implemented without constraint, while keeping interface simple. The obvious drawback is that it is time consuming and error prone. It also make a lot of things unclear, and dev choose the better safe than sorry road. That mean excessive copying to make sure one own the data, which is wasteful (in term of work for the copy itself, garbage generation and cache pressure). If this must be an option locally for system code, it doesn't seems like this is the right option at program scale and we do it in C++ simply because we have to. Finally, annotations are a great way to combine safety and speed, but generally come at a great cost when implenting uncommon ownership strategies where you ends up having to express complex lifetime and ownership relations. Ideally, we want to map with what the hardware does. So what does the hardware do ? Multicore CPU have various cores, each of them having layers of cache. Cache is organized in cache line and each cache line can be in various modes. Actual system are quite complex and deal with problems we are not very interesting here (like writeback) but the general idea is that every cache line is owned with different modes. Either the cache line is owned by a single core and can be written to, or the cache line shared by several cores, each of them having a local copy of the line, but none of them can write to. There is an internal bus where cores can exchange cache line with each other and messages to acquire cache line in read or read/write mode. That mean CPU are good at thread local read/write, shared immutable and transfer of ownership from one core to the other. They are bad at shared writable data (as effectively, the cache line will have to bounce back and forth between cores, and all memory access will need to be serialized instead of performed out of order). In that world, D has a bizaro position were it use a combination of annotations (immutable, shared) and GC. Ultimately, this is a good solution. Using annotation for common cases, fallback on GC/unsafe code when these annotations fall short. Before going into why it is fallign short, a digression on GC and the benefits of segregating the heap. In D, the heap is almost segregated in 3 groups: thread local, shared and immutable. These group are very interesting for the GC: - Thread local heap can be collected while disturbing only one thread. It should be possible to use different strategy in different threads. - Immutable heap can be collected 100% concurrently without any synchronization with the program. - Shared heap is the only one that require disturbing the whole program, but as a matter of good practice, this heap should be small anyway. Various ML family languages (like OCaml) have adopted segregated heap strategy and get great benefice out of it. For instance, OCaml's GC is known to outperform Java's in most scenarios. We are sitting on a huge GC goldmine here, but 3 things prevent us to exploit it: - Exceptions. They can bubble from one thread to the other and create implicit sharing. - Uniqueness (as it is defined now) as it allow for unique object to be merged with any heap. - message passing. Ownership transfert is not possible and so unsafe casting ensue. * It has to be noted that delegate allow as well for this kind of stunt, but this is recognized as a bug by now and hopefully it is gonna be fixed. D has a type qualifier system for which we pay a big price. Getting everything const correct is difficult. We'd want to get the most bang for the buck. One of the bang we are not far to be able to get is segregating the heap. That mean shitty GC and unsafe code. Let's present a concrete exemple using ownership: pure Object foo() { ... } immutable o = foo(); This is valid code. However, foo can do arbitrary manipulation to come up with the object. These include various allocations. These allocation are mutable into foo, which makes it impossible to allocate them on the immutable heap (as a GC relying on this immutability could mess up things pretty bad). They also cannot be allocated on the TL heap as once promoted to immutable, the data become shared as well. On the other hand, ownership means that the compiler can know when things go out of scope and free them explicitly. Which is a plus as generating less garbage is always a way to improve garbage collection. The most efficient work there is is the one that do not need to be done. I'd argue for the introduction of a basic ownership system. Something much simpler than rust's, that do not cover all uses cases. But the good thing is that we can fallback on GC or unsafe code when the system show its limits. That mean we rely less on the GC, while being able to provide a better GC. We already pay a cost at interface with type qualifier, let's make the best of it ! I'm proposing to introduce a new type qualifier for owned data. Now it means that throw statement expect a owned(Throwable), that pure function that currently return an implicitly unique object will return owned(Object) and that message passing will accept to pass around owned stuff. The GC heap can be segregated into island. We currently have 3 types of islands : Thread local, shared and immutable. These are builtin island with special characteristics in the language. The new qualifier introduce a new type of island, the owned island. owned island can only refers to other owned island and to immutable. they can be merged in any other island at any time (that is why they can't refers to TL or shared). owned(T) can be passed around as function parameter or returned, or stored as fields. When doing so they are consumed. When an owned is not consumed and goes out of scope, the whole island is freed. That means that owned(T) can implicitly decay into T, immutable(T), shared(T) at any time. When doing so, a call to the runtime is done to merge the owned island to the corresponding island. It is passed around as owned, then the ownership is transferred and all local references to the island are invalidated (using them is an error). On an implementation level, a call to a pure function that return an owned could look like this : { IslandID __saved = gc_switch_new_island(); scope(exit) gc_restore_island(__saved); call_pure_function(); } This allow us to rely much less on the GC and allow for a better GC implementation. nogc . Remember ? It was in the title. What does a nogc function look like ? a no gc function o not produce any garbage or trigger the collection cycle. there is no reason per se to prevent the nogc code to allocate on the GC as long as you know it won't produce garbage. That mean the only operation you need to ban are the one that merge the owned things into TL, shared or immutable heap. This solves the problem of the nogc + Exception. As Exception are isolated, they can be allocated, throw and catched into nogc code without generating garbage. They can safely bubble out of the nogc section of the code and still be safe. The same way, it open the door for a LOT of code that is not nogc to be. If the code allocate memory in an owned island and return it, then it is now up to the caller to decide whether is want's it garbage collected or keep it as owned (and/or make it reference counted for instance). The solution of passing a policy at compile for allocation is close to what C++'s stdlib is doing, and even if the proposed approach by Andrei is better, I don't think this is a good one. The proposed approach allow for a lot of code to be marked as nogc and allow for the caller to decide. That is ultimately what we want libraries to look like.I think a combination of the C++'s standard library's approach and Rust's approach would actually be the best possible. If we were to follow C++'s strategy, I think it would be important to make sure that it wouldn't require specifically adding template parameters and constraints, and instead allow the use of a concept-like system. I think that being able to default the allocator parameter to the GC, provided the current method is not nogc, would also be a good idea. I think that if C++'s approach were taken it would also be very beneficial to allow a syntax such as `auto obj = new MyClass() with allocator`, and `delete obj with allocator`. I do think that the definition of nogc would have to be slightly expanded though, so mean that any values that are allocated with a given allocator are also freed with the given allocator before returning. To connect back with your proposal and allow even more to be nogc, an owned(MyClass, allocator) object would be allowable to be returned in an nogc function. This would allow transfer of the ownership of the data and responsibility of deletion to the caller, provided that the caller is nogc. If the caller is nogc and fails to free the memory, DMD should produce an error. If the caller is not nogc, then DMD would say nothing, and assume that the allocator used to allocate the object will do the cleanup. This would allow far more situations to be accounted for by the allocation system without needing a GC, while still allowing programs that want to to use the GC. nogc would simply mean that no garbage is produced, it would not make a guarantee of what allocator was used to perform the allocation. nogc would also mean that no allocators, other than the ones passed in by the user, would be used to perform the allocations. This allows the current definition of nogc to still be present, while opening the scope of nogc up for use in a much larger variety of situations.
Nov 11 2014
On 12/11/2014 3:34 p.m., deadalnix wrote:Hi all, I want to get back on the subject of ownership, lifetime and propose some solution, but before, propose to state the problem in a way that haven't seen before (even if I have no doubt some have came to the same conclusion in the past). The problem at hand is double: memory management and thread safety. Number one has been a hot topic for ages, and number 2 has become very over the past years, to the widespreading of multicores CPU. The problem at hand here is ownership of data. There are 3 roads you can go about it: - immutability and GC. Effectively, these 2 technique allow you to get rid of ownership. There are advantages and drawbacks i'm going to discuss later. - Being unsafe and rely on convention. This is the C++ road (and a possible road in D). It allow to implement almost any wanted scheme, but come at great cost for the developer. - Annotations. This is the Rust road. It also come a great cost for the developer, as some schemes may be non trivial to express granted the type system, but, contrary to the C++ road, is safe. These approach all have some very nice things going on for them, but also some killer scenarios. Immutability+GC allow to have safety while keeping interfaces simple. That is of great value. It also come with some nice goodies, in the sense that is it easy and safe to shared data without bookkeeping, allowing one to fit more in cache, and reduce the amount of garbage created. Most text processing apps fall into this category and this is why D is that good at them. Another big goodies is that many lock free algorithm become possible. Once you remove the need for bookkeeping of ownership many operations can be implemented in an atomic manner. Additionally, it is possible to implement various GC optimization on immutable heap, which make the GC generally more efficient. But the cost is also real. For some use case, this mean having a large amount of garbage generated (Carmack wrote a piece on haskell were he mention the disastrous effect that having a framebuffer immutable would have: you'd have to clone it everytime you draw in it, which is a no go). GC also tend to cause unpredictable runtime characteristics, which programs with real time constraint can have hard time to deal with. Relying on convention has the advantage that any scheme can be implemented without constraint, while keeping interface simple. The obvious drawback is that it is time consuming and error prone. It also make a lot of things unclear, and dev choose the better safe than sorry road. That mean excessive copying to make sure one own the data, which is wasteful (in term of work for the copy itself, garbage generation and cache pressure). If this must be an option locally for system code, it doesn't seems like this is the right option at program scale and we do it in C++ simply because we have to. Finally, annotations are a great way to combine safety and speed, but generally come at a great cost when implenting uncommon ownership strategies where you ends up having to express complex lifetime and ownership relations. Ideally, we want to map with what the hardware does. So what does the hardware do ? Multicore CPU have various cores, each of them having layers of cache. Cache is organized in cache line and each cache line can be in various modes. Actual system are quite complex and deal with problems we are not very interesting here (like writeback) but the general idea is that every cache line is owned with different modes. Either the cache line is owned by a single core and can be written to, or the cache line shared by several cores, each of them having a local copy of the line, but none of them can write to. There is an internal bus where cores can exchange cache line with each other and messages to acquire cache line in read or read/write mode. That mean CPU are good at thread local read/write, shared immutable and transfer of ownership from one core to the other. They are bad at shared writable data (as effectively, the cache line will have to bounce back and forth between cores, and all memory access will need to be serialized instead of performed out of order). In that world, D has a bizaro position were it use a combination of annotations (immutable, shared) and GC. Ultimately, this is a good solution. Using annotation for common cases, fallback on GC/unsafe code when these annotations fall short. Before going into why it is fallign short, a digression on GC and the benefits of segregating the heap. In D, the heap is almost segregated in 3 groups: thread local, shared and immutable. These group are very interesting for the GC: - Thread local heap can be collected while disturbing only one thread. It should be possible to use different strategy in different threads. - Immutable heap can be collected 100% concurrently without any synchronization with the program. - Shared heap is the only one that require disturbing the whole program, but as a matter of good practice, this heap should be small anyway. Various ML family languages (like OCaml) have adopted segregated heap strategy and get great benefice out of it. For instance, OCaml's GC is known to outperform Java's in most scenarios. We are sitting on a huge GC goldmine here, but 3 things prevent us to exploit it: - Exceptions. They can bubble from one thread to the other and create implicit sharing. - Uniqueness (as it is defined now) as it allow for unique object to be merged with any heap. - message passing. Ownership transfert is not possible and so unsafe casting ensue. * It has to be noted that delegate allow as well for this kind of stunt, but this is recognized as a bug by now and hopefully it is gonna be fixed. D has a type qualifier system for which we pay a big price. Getting everything const correct is difficult. We'd want to get the most bang for the buck. One of the bang we are not far to be able to get is segregating the heap. That mean shitty GC and unsafe code. Let's present a concrete exemple using ownership: pure Object foo() { ... } immutable o = foo(); This is valid code. However, foo can do arbitrary manipulation to come up with the object. These include various allocations. These allocation are mutable into foo, which makes it impossible to allocate them on the immutable heap (as a GC relying on this immutability could mess up things pretty bad). They also cannot be allocated on the TL heap as once promoted to immutable, the data become shared as well. On the other hand, ownership means that the compiler can know when things go out of scope and free them explicitly. Which is a plus as generating less garbage is always a way to improve garbage collection. The most efficient work there is is the one that do not need to be done. I'd argue for the introduction of a basic ownership system. Something much simpler than rust's, that do not cover all uses cases. But the good thing is that we can fallback on GC or unsafe code when the system show its limits. That mean we rely less on the GC, while being able to provide a better GC. We already pay a cost at interface with type qualifier, let's make the best of it ! I'm proposing to introduce a new type qualifier for owned data. Now it means that throw statement expect a owned(Throwable), that pure function that currently return an implicitly unique object will return owned(Object) and that message passing will accept to pass around owned stuff. The GC heap can be segregated into island. We currently have 3 types of islands : Thread local, shared and immutable. These are builtin island with special characteristics in the language. The new qualifier introduce a new type of island, the owned island. owned island can only refers to other owned island and to immutable. they can be merged in any other island at any time (that is why they can't refers to TL or shared). owned(T) can be passed around as function parameter or returned, or stored as fields. When doing so they are consumed. When an owned is not consumed and goes out of scope, the whole island is freed. That means that owned(T) can implicitly decay into T, immutable(T), shared(T) at any time. When doing so, a call to the runtime is done to merge the owned island to the corresponding island. It is passed around as owned, then the ownership is transferred and all local references to the island are invalidated (using them is an error). On an implementation level, a call to a pure function that return an owned could look like this : { IslandID __saved = gc_switch_new_island(); scope(exit) gc_restore_island(__saved); call_pure_function(); } This allow us to rely much less on the GC and allow for a better GC implementation. nogc . Remember ? It was in the title. What does a nogc function look like ? a no gc function o not produce any garbage or trigger the collection cycle. there is no reason per se to prevent the nogc code to allocate on the GC as long as you know it won't produce garbage. That mean the only operation you need to ban are the one that merge the owned things into TL, shared or immutable heap. This solves the problem of the nogc + Exception. As Exception are isolated, they can be allocated, throw and catched into nogc code without generating garbage. They can safely bubble out of the nogc section of the code and still be safe. The same way, it open the door for a LOT of code that is not nogc to be. If the code allocate memory in an owned island and return it, then it is now up to the caller to decide whether is want's it garbage collected or keep it as owned (and/or make it reference counted for instance). The solution of passing a policy at compile for allocation is close to what C++'s stdlib is doing, and even if the proposed approach by Andrei is better, I don't think this is a good one. The proposed approach allow for a lot of code to be marked as nogc and allow for the caller to decide. That is ultimately what we want libraries to look like.Humm. import std.stdio; struct TypeOwnerShip(T) { T value; alias value this; this(T)(T value) { this.value = value; } // implict casts to immutable, shared? // on cast to immutable, shared change islands } T owned(T)(T value) { return TypeOwnerShip!T(value); } class Bar { int x; this(int x) pure { this.x = x; } } Bar foo() pure { return owned(new Bar(5)); } struct IslandAllocationStrategy { this(ubyte v = 0) { } void opWithIn() { writeln("opWithIn"); // thread local overriding } void opWithOut() { import std.stdio; writeln("opWithOut"); // reset thread local overriding } } property IslandAllocationStrategy island() { return IslandAllocationStrategy(); } void main() { writeln("start"); with(island) { opWithIn; writeln("{"); Bar myValue = foo(); writeln(myValue.x); writeln("}"); opWithOut; } writeln("end"); } I feel like I've suggested this, just without the CS theory.
Nov 11 2014
On Wednesday, 12 November 2014 at 03:13:20 UTC, Rikki Cattermole wrote:[...]yes and no. The ideas is similar, but it is not doable at library level if we want to get safety and the full benefit out of it, as it would require for the compiler to introduce some call to the runtime at strategic places and it does interact with nogc.
Nov 11 2014
On Wednesday, 12 November 2014 at 06:48:47 UTC, deadalnix wrote:On Wednesday, 12 November 2014 at 03:13:20 UTC, Rikki Cattermole wrote:I'm not sure. A library implementation may be feasible. For invalidation, the objects can be returned to their init state. This is "safe", but maybe not ideal, as a compiler error might indeed be better. Even implicit conversion to shared & immutable will be possible with multiple alias this, though it's worth discussing whether an explicit conversion isn't preferable. As for nogc, I see it as a clutch that's needed while no "real" solution is available, and as a tool for aiding transition once we have one. That said, some compiler support will definitely be necessary.[...]yes and no. The ideas is similar, but it is not doable at library level if we want to get safety and the full benefit out of it, as it would require for the compiler to introduce some call to the runtime at strategic places and it does interact with nogc.
Nov 12 2014
On Wednesday, 12 November 2014 at 12:57:58 UTC, Marc Schütz wrote:On Wednesday, 12 November 2014 at 06:48:47 UTC, deadalnix wrote:I now remember that I played around with a related idea when Andrei suggested his template solution (but without the islands part): http://wiki.dlang.org/User:Schuetzm/RC,_Owned_and_allocators The important part is that owning, RC and GC types all need to know the allocators they belong to. In my example I need that to allow Owned types to be converted to RC types. In your proposal, something similar will ultimately be needed for merging the islands into specific heaps. The nice thing is that this scheme also allows for switching allocators on the fly at runtime, without template bloat, at the cost of virtual calls for using the allocators (which is probably an acceptable compromise for an inherently expensive thing like allocation).On Wednesday, 12 November 2014 at 03:13:20 UTC, Rikki Cattermole wrote:I'm not sure. A library implementation may be feasible. For invalidation, the objects can be returned to their init state. This is "safe", but maybe not ideal, as a compiler error might indeed be better. Even implicit conversion to shared & immutable will be possible with multiple alias this, though it's worth discussing whether an explicit conversion isn't preferable. As for nogc, I see it as a clutch that's needed while no "real" solution is available, and as a tool for aiding transition once we have one. That said, some compiler support will definitely be necessary.[...]yes and no. The ideas is similar, but it is not doable at library level if we want to get safety and the full benefit out of it, as it would require for the compiler to introduce some call to the runtime at strategic places and it does interact with nogc.
Nov 13 2014
On Thursday, 13 November 2014 at 14:26:44 UTC, Marc Schütz wrote:The important part is that owning, RC and GC types all need to know the allocators they belong to. In my example I need that to allow Owned types to be converted to RC types. In your proposal, something similar will ultimately be needed for merging the islands into specific heaps. The nice thing is that this scheme also allows for switching allocators on the fly at runtime, without template bloat, at the cost of virtual calls for using the allocators (which is probably an acceptable compromise for an inherently expensive thing like allocation).If you have a owned type, you don't need a builtin RC type. As long as the wrapper that implement the RC own the RCed object, everything is safe. Also, this construct as the nice side effect that a single reference count can own a whole island instead of tracking all objects RC one by one.
Nov 13 2014
On 11/11/2014 6:34 PM, deadalnix wrote:[...]Thanks for an excellent summary of the problem. I can't just read your solution and know it works, it'll take some time.
Nov 11 2014
On Wednesday, 12 November 2014 at 06:16:34 UTC, Walter Bright wrote:On 11/11/2014 6:34 PM, deadalnix wrote:That is quite difficult to explain with drawing. Maybe I can discuss that with Andrei when he has some time with a whiteboard around.[...]Thanks for an excellent summary of the problem. I can't just read your solution and know it works, it'll take some time.
Nov 11 2014
On Wednesday, 12 November 2014 at 02:34:55 UTC, deadalnix wrote:The problem at hand here is ownership of data."ownership of data" is one possible solution, but not the problem. We are facing 2 problems: 1. A performance problem: Concurrency in writes (multiple writers, one writer, periodical locking during clean up etc). 2. A structural problem: Releasing resources correctly. I suggest that the ownership focus is on the latter, to support solid non-GC implementations. Then rely on conventions for multi-threading.- Being unsafe and rely on convention. This is the C++ road (and a possible road in D). It allow to implement almost any wanted scheme, but come at great cost for the developer.All performant solutions are going to be "unsafe" in the sense that you need to select a duplication/locking level that are optimal for the characteristics of the actual application. Copying data when you have no writers is too inefficient in real applications. Hardware support for transactional memory is going to be the easy approach for speeding up locking.- Annotations. This is the Rust road. It also come a greatI think Rust's approach would favour a STM approach where you create thread local copies for processing then merge the result back into the "shared" memory.Immutability+GC allow to have safety while keeping interfaces simple. That is of great value. It also come with some nice goodies, in the sense that is it easy and safe to shared data without bookkeeping, allowing one to fit more in cache, and reduce the amount of garbage created.How does GC fit more data in the cache? A GC usually has overhead and would typically generate more cache-misses due to unreachable in-cache ("hot") memory not being available for reallocation.Relying on convention has the advantage that any scheme can be implemented without constraint, while keeping interface simple. The obvious drawback is that it is time consuming and error prone. It also make a lot of things unclear, and dev choose the better safe than sorry road. That mean excessive copying to make sure one own the data, which is wasteful (in term of work for the copy itself, garbage generation and cache pressure). If this must be an option locally for system code, it doesn't seems like this is the right option at program scale and we do it in C++ simply because we have to. Finally, annotations are a great way to combine safety and speed, but generally come at a great cost when implenting uncommon ownership strategies where you ends up having to express complex lifetime and ownership relations.The core problem is that if you are unhappy with single-threaded applications then you are looking for high throughput using multi-threading. And in that case sacrificing performance by not using the optimal strategy becomes problematic. The optimal strategy is entirely dependent on the application and the dataset. Therefore you need to support multiple approaches: - per data structure GC - thread local GC - lock annotations of types or variables - speculative lock optimisations (transactional memory) And in the future you also will need to support the integration of GPU/Co-processors into mainstream CPUs. Metal and OpenCL is only a beginning…Ideally, we want to map with what the hardware does. So what does the hardware do ?That changes over time. The current focus in upcoming hardware is on: 1. Heterogenous architecture with high performance co-processors 2. Hardware support for transactional memory Intel CPUs might have buffered transactional memory within 5 years.from one core to the other. They are bad at shared writable data (as effectively, the cache line will have to bounce back and forth between cores, and all memory access will need to be serialized instead of performed out of order).This will vary a lot. On x86 you can write to a whole cache line (buffered) without reading it first and it uses a convenient cache coherency protocol (so that reads/write ops are in order). This is not true for all CPUs. I agree with others that say that a heterogeneous approach, like C++, is the better alternative. If parity with C++ is important then D needs to look closer at OpenMP, but that probably goes beyond what D can achieve in terms of implementation. Some observations: 1. If you are not to rely on conventions for sync'ing threads then you need a pretty extensive framework if you want good performance. 2. Safety will harm performance. 3. Safety with high performance levels requires a very complicated static analysis that will probably not work very well for larger programs. 4. For most applications performance will come through co-processors (GPGPU etc). 5. If hardware progresses faster than compiler development, then you will never reach the performance frontier… I think D needs to cut down on implementation complexity and ensure that the implementation time can catch up with hardware developments. The way to do it is: 1. Accept that generally performant multi-threaded code is unsafe and application/hardware optimized. 2. Focus on making nogc single-threaded code robust and fast. And I agree that ownership is key. 3. Use semantic analysis to automatically generate a tailored runtime with application-optimized allocators.
Nov 12 2014
On Wednesday, 12 November 2014 at 08:38:14 UTC, Ola Fosheim Grøstad wrote:That changes over time. The current focus in upcoming hardware is on: 1. Heterogenous architecture with high performance co-processors 2. Hardware support for transactional memory Intel CPUs might have buffered transactional memory within 5 years.I'm sorry to be blunt, but there is nothing actionable in your comment. You are just throwing more and more into the pot until nobody know what there is in. But ultimately, the crux of the problem is the thing quoted above. 1. No that do not change that much over time. The implementations details are changing, recent schemes become more complex to accommodate heterogeneous chips, but it is irrelevant here. What I've mentioned is true for all of them, and has been for at least 2 decades by now. There is no sign that this is gonna change. 2. The transactional memory thing is completely orthogonal to the subject at hand so, as the details of implementation of modern chip, this doesn't belong here. In addition, the whole CPU industry is backpedaling on the transactional memory concept. That is awesome on the paper, but it didn't worked. There is only 2 way to achieve good design. You remove useless things until there is obviously nothing wrong, or you add more and more until there is nothing obviously wrong. I won't follow you down the second road, so please stay on track.
Nov 12 2014
On Wednesday, 12 November 2014 at 08:55:30 UTC, deadalnix wrote:I'm sorry to be blunt, but there is nothing actionable in your comment. You are just throwing more and more into the pot until nobody know what there is in. But ultimately, the crux of the problem is the thing quoted above.My point is that you are making too many assumptions about both applications and hardware.2. The transactional memory thing is completely orthogonal to the subject at hand so, as the details of implementation of modern chip, this doesn't belong here. In addition, the whole CPU industry is backpedaling on the transactional memory concept. That is awesome on the paper, but it didn't worked.STM is used quite a bit. Hardware backed TM is used by IBM. For many computationally intensive applications high levels of parallelism is achieved using speculative computation. TM supports that.There is only 2 way to achieve good design. You remove useless things until there is obviously nothing wrong, or you add more and more until there is nothing obviously wrong. I won't follow you down the second road, so please stay on track.Good design is achieved by understanding different patterns of concurrency in applications and how it can reach peak performance in the environment (hardware). If D is locked to a narrow memory model then you can only reach high performance on a subset of applications. If D wants to support system level programming then it needs to taken an open approach to the memory model.
Nov 12 2014
On Wednesday, 12 November 2014 at 08:55:30 UTC, deadalnix wrote:On Wednesday, 12 November 2014 at 08:38:14 UTC, Ola Fosheim In addition, the whole CPU industry is backpedaling on the transactional memory concept. That is awesome on the paper, but it didn't worked.Given the support on Haskell, Clojure and C++ I am not sure if they are really backpedaling on it. The Haskell bugs are supposed to have been fixed in the next generation. And there is PowerPC A2 as well. Not that I have any use for it, though. -- Paulo
Nov 12 2014
On Wednesday, 12 November 2014 at 09:56:57 UTC, Paulo Pinto wrote:On Wednesday, 12 November 2014 at 08:55:30 UTC, deadalnix wrote:I actually tested Haswell HLE and was underwhelmed (not the full STM, was just trying to get more out of some locks). The trouble with STM is that to be applicable, you need to have huge contention (else it wouldn't be a bottleneck) and a small task to do. And this use case is already well served with spinlock-guarded locks which already allows to stay in user space most of the time. That, added with the usual restrictions and gotchas for lockfree, makes it something not very life-changing at least in my limited experience.On Wednesday, 12 November 2014 at 08:38:14 UTC, Ola Fosheim In addition, the whole CPU industry is backpedaling on the transactional memory concept. That is awesome on the paper, but it didn't worked.Given the support on Haskell, Clojure and C++ I am not sure if they are really backpedaling on it. The Haskell bugs are supposed to have been fixed in the next generation. And there is PowerPC A2 as well. Not that I have any use for it, though. -- Paulo
Nov 12 2014
On Wednesday, 12 November 2014 at 11:08:41 UTC, ponce wrote:I actually tested Haswell HLE and was underwhelmed (not the full STM, was just trying to get more out of some locks).STM = software based transactional memory (without hardware support) Haswell does not have buffered transactions so you wait for the commit, but there are presentations out where Intel has put buffered transactions at around 2017… (but I would expect a delay).
Nov 12 2014
On Wednesday, 12 November 2014 at 11:19:59 UTC, Ola Fosheim Grøstad wrote:STM = software based transactional memory (without hardware support)I was meaning HTM instead, good catch.Haswell does not have buffered transactions so you wait for the commit, but there are presentations out where Intel has put buffered transactions at around 2017… (but I would expect a delay).I wasn't arguing of the current performance (which is not impressive). My point is that transactional memory has limited applicability, since locks already fits the bill well. And I'd argue the same with most lockfree structures actually.
Nov 12 2014
On Wednesday, 12 November 2014 at 11:51:11 UTC, ponce wrote:Yes, Intel style HTM is only an optimization for high contention where you already have locking code in place, since you need to take a lock as a fallback anyway. But useful for database-like situations where you might have 7 readers traversing and 1 writer updating a leaf node. It is of course difficult to say how it will perform in 2./3. generation implementations or if the overall hardware architecture will change radically (as we see in Phi and Parallella). I can easily imagine that the on-die architecture will change radically, within a decade, with the current x86 architecture remaining at a coordination level. This is the direction Phi seems to be going. In that case, maybe the performance of the x86 will be less critical (if it spends most time waiting and buffering is done in hardware).Haswell does not have buffered transactions so you wait for the commit, but there are presentations out where Intel has put buffered transactions at around 2017… (but I would expect a delay).I wasn't arguing of the current performance (which is not impressive). My point is that transactional memory has limited applicability, since locks already fits the bill well.And I'd argue the same with most lockfree structures actually.I think in general that you need to create application specific data-structures to get performance and convenience. (I seldom reuse lists and graph-like data structures for this reason, it is just easier to create them from scratch.) I also agree that you usually can get away with regular locks and very simple lockfree structures where performance matters (such as a lockfree stack where only one thread removes nodes). Where performance truly matters you probably need to split up the dataset based on the actual computations and run over the data in a batch-like SIMD way anyway. (E.g. physics simulation).
Nov 12 2014
On Wednesday, 12 November 2014 at 02:34:55 UTC, deadalnix wrote:Before going into why it is fallign short, a digression on GC and the benefits of segregating the heap. In D, the heap is almost segregated in 3 groups: thread local, shared and immutable. These group are very interesting for the GC: - Thread local heap can be collected while disturbing only one thread. It should be possible to use different strategy in different threads. - Immutable heap can be collected 100% concurrently without any synchronization with the program. - Shared heap is the only one that require disturbing the whole program, but as a matter of good practice, this heap should be small anyway.All this is unfortunately only true if there are no references between heaps, i.e. if the heaps are indeed "islands". Otherwise, there need to be at least write barriers.I'd argue for the introduction of a basic ownership system. Something much simpler than rust's, that do not cover all uses cases. But the good thing is that we can fallback on GC or unsafe code when the system show its limits. That mean we rely less on the GC, while being able to provide a better GC. We already pay a cost at interface with type qualifier, let's make the best of it ! I'm proposing to introduce a new type qualifier for owned data. Now it means that throw statement expect a owned(Throwable), that pure function that currently return an implicitly unique object will return owned(Object) and that message passing will accept to pass around owned stuff. The GC heap can be segregated into island. We currently have 3 types of islands : Thread local, shared and immutable. These are builtin island with special characteristics in the language. The new qualifier introduce a new type of island, the owned island. owned island can only refers to other owned island and to immutable. they can be merged in any other island at any time (that is why they can't refers to TL or shared). owned(T) can be passed around as function parameter or returned, or stored as fields. When doing so they are consumed. When an owned is not consumed and goes out of scope, the whole island is freed. That means that owned(T) can implicitly decay into T, immutable(T), shared(T) at any time. When doing so, a call to the runtime is done to merge the owned island to the corresponding island. It is passed around as owned, then the ownership is transferred and all local references to the island are invalidated (using them is an error). On an implementation level, a call to a pure function that return an owned could look like this : { IslandID __saved = gc_switch_new_island(); scope(exit) gc_restore_island(__saved); call_pure_function(); }This is nice. Instead of calling fixed helpers in Druntime, it can also make an indirect call to allow for pluggable (and runtime switchable) allocators.The solution of passing a policy at compile for allocation is close to what C++'s stdlib is doing, and even if the proposed approach by Andrei is better, I don't think this is a good one. The proposed approach allow for a lot of code to be marked as nogc and allow for the caller to decide. That is ultimately what we want libraries to look like.+1 Andrei's approach mixes up memory allocation and memory management. Library functions shouldn't know about the latter. This proposal is clearly better and cleaner in this respect.
Nov 12 2014
On Wednesday, 12 November 2014 at 12:49:41 UTC, Marc Schütz wrote:All this is unfortunately only true if there are no references between heaps, i.e. if the heaps are indeed "islands". Otherwise, there need to be at least write barriers.Yes, that is exactly why I'm listing the case where these can be created in safe code and propose a solution to plug the hole (and which brings other benefits along the road).
Nov 12 2014
On Wednesday, 12 November 2014 at 21:15:05 UTC, deadalnix wrote:On Wednesday, 12 November 2014 at 12:49:41 UTC, Marc Schütz wrote:Hmm... I can't find that in what you wrote. To clarify: I'm talking about the fact that, for example, a thread-local heap can contain references into the immutable and shared heaps. Therefore, the immutable heap can _not_ be collected without disturbing any threads, because any TL heaps (and stacks!) can potentially have references to it. They either need to be stopped, or write barriers need to be utilized when references to immutable data are changed.All this is unfortunately only true if there are no references between heaps, i.e. if the heaps are indeed "islands". Otherwise, there need to be at least write barriers.Yes, that is exactly why I'm listing the case where these can be created in safe code and propose a solution to plug the hole (and which brings other benefits along the road).
Nov 13 2014
On Thursday, 13 November 2014 at 10:10:34 UTC, Marc Schütz wrote:potentially have references to it. They either need to be stopped, or write barriers need to be utilized when references to immutable data are changed.I sometimes wonder if the very pragmatic and brutal option of just having GC on the main thread and prevent all escapes of objects from the main thread without explicit ownership transfer would solve D's GC performance related problems. It would work for real time applications and it would work for people who use D in a scripty fashion (but not for vibe.d).
Nov 13 2014
On Thursday, 13 November 2014 at 10:10:34 UTC, Marc Schütz wrote:On Wednesday, 12 November 2014 at 21:15:05 UTC, deadalnix wrote:You need a set of root from the TL heap. The trick being to consider everything that is allocated AFTER you get the roots as automatically alive (you'll collect this in the next collection cycle). That way, new allocated chunk that have reference in the TL heap will be kept alive, even if you don't know about the roots. You'll find plenty of information about the details if look into GC for ML family languages.On Wednesday, 12 November 2014 at 12:49:41 UTC, Marc Schütz wrote:Hmm... I can't find that in what you wrote. To clarify: I'm talking about the fact that, for example, a thread-local heap can contain references into the immutable and shared heaps. Therefore, the immutable heap can _not_ be collected without disturbing any threads, because any TL heaps (and stacks!) can potentially have references to it. They either need to be stopped, or write barriers need to be utilized when references to immutable data are changed.All this is unfortunately only true if there are no references between heaps, i.e. if the heaps are indeed "islands". Otherwise, there need to be at least write barriers.Yes, that is exactly why I'm listing the case where these can be created in safe code and propose a solution to plug the hole (and which brings other benefits along the road).
Nov 13 2014
On Thursday, 13 November 2014 at 22:12:22 UTC, deadalnix wrote:You need a set of root from the TL heap. The trick being to consider everything that is allocated AFTER you get the roots as automatically alive (you'll collect this in the next collection cycle). That way, new allocated chunk that have reference in the TL heap will be kept alive, even if you don't know about the roots. You'll find plenty of information about the details if look into GC for ML family languages.Consider this: auto i = new immutable(int); immutable(int)* a, b; b = i; // GC kicks in here, scans `a` (== null) a = b; b = null; // GC goes on, scans `b` (== null) // => whoops, no reference to *i Here, a and b are on the stack or in registers. They could also be on the TL heap.
Nov 14 2014
On Friday, 14 November 2014 at 11:46:51 UTC, Marc Schütz wrote:On Thursday, 13 November 2014 at 22:12:22 UTC, deadalnix wrote:That is a well covered subject and told you what to google for as well as the basic approach. Your example here simply told me you haven't done your homework before posting. Please go look into scientific documentation about GC for ML languages.You need a set of root from the TL heap. The trick being to consider everything that is allocated AFTER you get the roots as automatically alive (you'll collect this in the next collection cycle). That way, new allocated chunk that have reference in the TL heap will be kept alive, even if you don't know about the roots. You'll find plenty of information about the details if look into GC for ML family languages.Consider this: auto i = new immutable(int); immutable(int)* a, b; b = i; // GC kicks in here, scans `a` (== null) a = b; b = null; // GC goes on, scans `b` (== null) // => whoops, no reference to *i Here, a and b are on the stack or in registers. They could also be on the TL heap.
Nov 14 2014
On Friday, 14 November 2014 at 21:59:47 UTC, deadalnix wrote:That is a well covered subject and told you what to google for as well as the basic approach. Your example here simply told me you haven't done your homework before posting. Please go look into scientific documentation about GC for ML languages.It would help if you post links to articles. ML is geared towards functional programming which have different behaviour from system level imperative programming, so I am not sure if ML is the best starting point. From https://ocaml.org/learn/tutorials/garbage_collection.html : «OCaml's garbage collector has two heaps, the minor heap and the major heap. This recognises a general principle: Most objects are small and allocated frequently and then immediately freed. These objects go into the minor heap first, which is GCed frequently. Only some objects are long lasting. These objects get promoted from the minor heap to the major heap after some time, and the major heap is only collected infrequently. The OCaml GC is synchronous. It doesn't run in a separate thread, and it can only get called during an allocation request.» Nothing about segregation here. The older MLkit which uses a regional allocator is interesting, but probably not what you are talking about.
Nov 14 2014
ML is interesting because it emphasis immutability. For performance reasons, a part of it is in effect mutable and thread local. Some ML implementations are very interesting for us. But let's get back to D. To make the example simpler, let's get rid of shared (in effect, the same thing can be achieve with write barrier on shared and do not fundamentally affect the way it works). So we have a TL GC that run on a regular basis.When doing so, it also collect the pointers to the immutable heap and give the to the immutable GC as roots. Now the immutable GC works from these roots, but will consider everything allocated after it get its root as alive. This is working because the new root to the immutable heap that can appear in the TL heap can come from: - new allocations. - from read in immutable heap. - other thread (and it will be a root from their scan). system. An exchange from a non scanned thread to a scanned one will register a root. getting the roots are considered alive. require a reference to the immutable heap. Ultimately, you end up As the chain of reference is immutable, the one you read will ultimately be scanned. The idea is based on Doligez-Leroy's GC, but using TL heap as the small object and immutable heap as the shared. Note that this GC is done for ML, where most things are immutable and this is why it works well (only require write barriers). This strategy would be a disaster in java for instance, or for us to use the small object strategy for TL. http://gallium.inria.fr/~xleroy/publi/concurrent-gc.pdf http://www.cs.tau.ac.il/~msagiv/courses/mm/doligez.ppt
Nov 14 2014
On Saturday, 15 November 2014 at 00:16:22 UTC, deadalnix wrote:The idea is based on Doligez-Leroy's GC, but using TL heap as the small object and immutable heap as the shared. Note that this GC is done for ML, where most things are immutable and this is why it works well (only require write barriers). This strategy would be a disaster in java for instance, or for us to use the small object strategy for TL. http://gallium.inria.fr/~xleroy/publi/concurrent-gc.pdf http://www.cs.tau.ac.il/~msagiv/courses/mm/doligez.pptThanks for the link! I agree ML has some interesting work done for it, but they make some assumptions: 1. low portion of mutable objects 2. small heap fits in per-core-cache (<256KiB on Haswell). So the small heap is basically a region allocator that caches allocations that are likely to die, cutting down on the more costly effort to update the big heap. But I am not sure if that fits system level programming, that is more typical for functional programming…
Nov 15 2014
On Saturday, 15 November 2014 at 12:52:33 UTC, Ola Fosheim Grøstad wrote:Thanks for the link! I agree ML has some interesting work done for it, but they make some assumptions: 1. low portion of mutable objects 2. small heap fits in per-core-cache (<256KiB on Haswell). So the small heap is basically a region allocator that caches allocations that are likely to die, cutting down on the more costly effort to update the big heap. But I am not sure if that fits system level programming, that is more typical for functional programming…The small heap do not apply for us. We can't reproduce that part of the GC in D. However, the big heap part of the strategy would fit D's immutable heap very nicely.
Nov 15 2014
On Saturday, 15 November 2014 at 20:13:47 UTC, deadalnix wrote:On Saturday, 15 November 2014 at 12:52:33 UTC, Ola Fosheim Grøstad wrote:Maybe, but immutable in ML means that two different immutable objects are 100% indistinguishable if they have the same value. Quite often you have a cluster of objects (that could be an isolate) that are periodically immutable. I assume what is most interesting is whether it is immutable between two collections, and not whether it is mutable throughout the lifespan? There are two points in the powerpoint that sticks out though: * «trade collection “quality” for level of synchronization - allow large amounts of floating garbage» * «trade collection “simplicity” for level of synchronization - complicated algorithm (not to mention correctness proof)» And also this: «Root enumeration * Raise a flag to signal beginning of marking * shade globals * ask mutators to shade roots * wait until all mutators answered * meanwhile - start scanning and marking» Does this mean that you need all threads (which I presume are the mutators) to be in an eventloop in order to collect? Online version of the slides: http://s0.docspal.com/files/processed/64/6863864-yusxlrni/doligez.pdfThanks for the link! I agree ML has some interesting work done for it, but they make some assumptions: 1. low portion of mutable objects 2. small heap fits in per-core-cache (<256KiB on Haswell). So the small heap is basically a region allocator that caches allocations that are likely to die, cutting down on the more costly effort to update the big heap. But I am not sure if that fits system level programming, that is more typical for functional programming…The small heap do not apply for us. We can't reproduce that part of the GC in D. However, the big heap part of the strategy would fit D's immutable heap very nicely.
Nov 18 2014
On Tuesday, 18 November 2014 at 20:34:01 UTC, Ola Fosheim Grøstad wrote:Does this mean that you need all threads (which I presume are the mutators) to be in an eventloop in order to collect?What you need is for each thread to provide you a list of roots. You can start tracing while having an incomplete list of roots, so the level of concurrency allowed is high. As for the immutability thing, here is how it goes. When you compile to native, you can either do write barrier all the time, via writing point using a function call in the runtime. The second option is to not do it and use memory protection to trap write. Option 1 is usually preferred by VM that can swicth implementation of methods when collecting, so you pay a moderate cost and only when you collect. But if you compile AOT, you always pay that price and it is not very interesting. Optin 2 is WAY more expensive and will trap even write to value, not only pointers. If it expensive, it is only expensive when it traps. That mean it is a very interesting choice for code compiled ahead of time and manipulating immutable data. This is very interesting in the case of ML, where the compiler sometime choses to mutate, but only as an optimization, and very rarely on object that reach the big heap, as if the object is long lived enough to reach that heap, most likely the compiler can't prove it is unique and thus mutable. The same strategy seems like the obvious winner for D's immutable heap as well as part of the shared heap that do not contain mutable pointers.
Nov 18 2014
What would be the next step to keep this effort going forward ? A POC implementation would be a lot of work but it would also help people detect corner cases or unsuspected interaction with some features of the language.
Nov 21 2014
On Friday, 14 November 2014 at 21:59:47 UTC, deadalnix wrote:On Friday, 14 November 2014 at 11:46:51 UTC, Marc Schütz wrote: That is a well covered subject and told you what to google for as well as the basic approach. Your example here simply told me you haven't done your homework before posting.Ok, you caught me there :-P To my excuse, I did do some (too) superficial research, but even found statements like some ML languages not supporting multithreading, but no details about GC techniques used. But your explanation makes it clear, and this indeed a very clever idea.
Nov 16 2014
On Sunday, 16 November 2014 at 10:21:53 UTC, Marc Schütz wrote:On Friday, 14 November 2014 at 21:59:47 UTC, deadalnix wrote:If you mean OCaml, the runtime is being made multi-thread. http://www.ocamlpro.com/blog/2013/07/01/monthly-06.htmlOn Friday, 14 November 2014 at 11:46:51 UTC, Marc Schütz wrote: That is a well covered subject and told you what to google for as well as the basic approach. Your example here simply told me you haven't done your homework before posting.Ok, you caught me there :-P To my excuse, I did do some (too) superficial research, but even found statements like some ML languages not supporting multithreading, but no details about GC techniques used. But your explanation makes it clear, and this indeed a very clever idea.
Nov 17 2014
12-Nov-2014 05:34, deadalnix пишет:Hi all, I want to get back on the subject of ownership, lifetime and propose some solution, but before, propose to state the problem in a way that haven't seen before (even if I have no doubt some have came to the same conclusion in the past).[snip nice summary]In that world, D has a bizaro position were it use a combination of annotations (immutable, shared) and GC. Ultimately, this is a good solution. Using annotation for common cases, fallback on GC/unsafe code when these annotations fall short.Aye.Before going into why it is fallign short, a digression on GC and the benefits of segregating the heap. In D, the heap is almost segregated in 3 groups: thread local, shared and immutable. These group are very interesting for the GC: - Thread local heap can be collected while disturbing only one thread. It should be possible to use different strategy in different threads. - Immutable heap can be collected 100% concurrently without any synchronization with the program. - Shared heap is the only one that require disturbing the whole program, but as a matter of good practice, this heap should be small anyway. Various ML family languages (like OCaml) have adopted segregated heap strategy and get great benefice out of it. For instance, OCaml's GC is known to outperform Java's in most scenarios.+1000 We should take advantage of segregated heap to make all complexity of shared/immutable/TL finally pay off.I'd argue for the introduction of a basic ownership system. Something much simpler than rust's, that do not cover all uses cases. But the good thing is that we can fallback on GC or unsafe code when the system show its limits. That mean we rely less on the GC, while being able to provide a better GC. We already pay a cost at interface with type qualifier, let's make the best of it ! I'm proposing to introduce a new type qualifier for owned data. Now it means that throw statement expect a owned(Throwable), that pure function that currently return an implicitly unique object will return owned(Object) and that message passing will accept to pass around owned stuff. The GC heap can be segregated into island. We currently have 3 types of islands : Thread local, shared and immutable. These are builtin island with special characteristics in the language. The new qualifier introduce a new type of island, the owned island.Seems sane. owned(Exception) would be implicitly assumed i.e.: catch(Exception e){ ... } would be seen by compiler as: catch(owned(Exception) e){ ... } What happens if I throw l-value exception? Do I need to cast or assumeOwned it? It's easy to see how it goes with r-values, such as new Exception(...), since they are "unique expressions" whatever that means ;)owned island can only refers to other owned island and to immutable. they can be merged in any other island at any time (that is why they can't refers to TL or shared). owned(T) can be passed around as function parameter or returned, or stored as fields. When doing so they are consumed. When an owned is not consumed and goes out of scope, the whole island is freed. That means that owned(T) can implicitly decay into T, immutable(T), shared(T) at any time. When doing so, a call to the runtime is done to merge the owned island to the corresponding island. It is passed around as owned, then the ownership is transferred and all local references to the island are invalidated (using them is an error). On an implementation level, a call to a pure function that return an owned could look like this : { IslandID __saved = gc_switch_new_island(); scope(exit) gc_restore_island(__saved); call_pure_function(); } This allow us to rely much less on the GC and allow for a better GC implementation.I take it that owned(T) is implicitly deduced by compiler in case of pure functions? Also it seem templates should not take owned(T) into consideration and let it decay... How does owned compose with other qualifiers?nogc . Remember ? It was in the title. What does a nogc function look like ? a no gc function o not produce any garbage or trigger the collection cycle. there is no reason per se to prevent the nogc code to allocate on the GC as long as you know it won't produce garbage. That mean the only operation you need to ban are the one that merge the owned things into TL, shared or immutable heap. This solves the problem of the nogc + Exception. As Exception are isolated, they can be allocated, throw and catched into nogc code without generating garbage. They can safely bubble out of the nogc section of the code and still be safe.Seems absolutely cool. But doesn't allocating exception touches heap anyway? I take it that if I don't save exception explicitly anywhere the owned island is destroyed at catch scope?The same way, it open the door for a LOT of code that is not nogc to be. If the code allocate memory in an owned island and return it, then it is now up to the caller to decide whether is want's it garbage collected or keep it as owned (and/or make it reference counted for instance). The solution of passing a policy at compile for allocation is close to what C++'s stdlib is doing, and even if the proposed approach by Andrei is better, I don't think this is a good one. The proposed approach allow for a lot of code to be marked as nogc and allow for the caller to decide. That is ultimately what we want libraries to look like.I'm not sure I get all details but I like your proposal MUCH better then forcibly introducing ref-counting. -- Dmitry Olshansky
Nov 12 2014
On Wednesday, 12 November 2014 at 20:36:32 UTC, Dmitry Olshansky wrote:Seems sane. owned(Exception) would be implicitly assumed i.e.: catch(Exception e){ ... } would be seen by compiler as: catch(owned(Exception) e){ ... } What happens if I throw l-value exception? Do I need to cast or assumeOwned it? It's easy to see how it goes with r-values, such as new Exception(...), since they are "unique expressions" whatever that means ;)Yes, the unsafe road must always be open, we are a system programming language :)I take it that owned(T) is implicitly deduced by compiler in case of pure functions? Also it seem templates should not take owned(T) into consideration and let it decay... How does owned compose with other qualifiers?You mean what is I have an owned field into an object ? In the case you pass the owned where a TL, shared or immutable is expected, the island is merged so the question do not make sense. An owned field in an object is interpreted as follow: - immutable => immutable - shared => owned (and can be touched only if the shared object is synchronized, which allow to hide a whole hierarchy behind a mutex. That is another selling point but I don't wanted to get into all the details as the post was already quite big). - const => const owned (essentially unusable - except via burrowing if we ever want to go that road one day).Seems absolutely cool. But doesn't allocating exception touches heap anyway? I take it that if I don't save exception explicitly anywhere the owned island is destroyed at catch scope?Yes it touches the heap. But as long as things are owned, they'll be freed automatically when going out of scope. That means, with that definition of things, what is forbidden in nogc code is to consume the owned in such a fashion that its island is merged into TL, shared or immutable heap. If you don't do this, then your isolated will be freed when going out of scope and the GC won't need to kick in/no garbage will be produced. Doing so allow for relaxing the constraint in nogc and allow for the same library code to be used with or without GC.
Nov 12 2014
13-Nov-2014 00:27, deadalnix пишет:On Wednesday, 12 November 2014 at 20:36:32 UTC, Dmitry Olshansky wrote:Sorry, forgot to reply. Here is the case I wanted to check: try{ ... } catch(owned(Exception) e){ foo(e); } void foo(T)(T arg){ // what would this print? Exception or owned(Exception) // do we bloat a bit more on qualifiers? pragma(msg, T); }Seems sane. owned(Exception) would be implicitly assumed i.e.: catch(Exception e){ ... } would be seen by compiler as: catch(owned(Exception) e){ ... } What happens if I throw l-value exception? Do I need to cast or assumeOwned it? It's easy to see how it goes with r-values, such as new Exception(...), since they are "unique expressions" whatever that means ;)Yes, the unsafe road must always be open, we are a system programming language :)I take it that owned(T) is implicitly deduced by compiler in case of pure functions? Also it seem templates should not take owned(T) into consideration and let it decay... How does owned compose with other qualifiers?You mean what is I have an owned field into an object ? In the case you pass the owned where a TL, shared or immutable is expected, the island is merged so the question do not make sense.An owned field in an object is interpreted as follow: - immutable => immutable - shared => owned (and can be touched only if the shared object is synchronized, which allow to hide a whole hierarchy behind a mutex. That is another selling point but I don't wanted to get into all the details as the post was already quite big). - const => const owned (essentially unusable - except via burrowing if we ever want to go that road one day).Thanks, looks sane.-- Dmitry OlshanskySeems absolutely cool. But doesn't allocating exception touches heap anyway? I take it that if I don't save exception explicitly anywhere the owned island is destroyed at catch scope?Yes it touches the heap. But as long as things are owned, they'll be freed automatically when going out of scope. That means, with that definition of things, what is forbidden in nogc code is to consume the owned in such a fashion that its island is merged into TL, shared or immutable heap. If you don't do this, then your isolated will be freed when going out of scope and the GC won't need to kick in/no garbage will be produced. Doing so allow for relaxing the constraint in nogc and allow for the same library code to be used with or without GC.
Nov 14 2014
On Friday, 14 November 2014 at 19:02:52 UTC, Dmitry Olshansky wrote:Here is the case I wanted to check: try{ ... } catch(owned(Exception) e){ foo(e); } void foo(T)(T arg){ // what would this print? Exception or owned(Exception) // do we bloat a bit more on qualifiers? pragma(msg, T); }It needs to be `owned(Exception)`, otherwise, how could the compiler know how to treat it correctly? But declaring foo() in that way would be unhelpful, because it would move the exception on calling the function, which is usually not desired. What you want here, instead, is borrowing: void foo(T)(scope(T) arg) { pragma(msg, T); }
Nov 14 2014
On Friday, 14 November 2014 at 20:22:13 UTC, Marc Schütz wrote:It needs to be `owned(Exception)`, otherwise, how could the compiler know how to treat it correctly? But declaring foo() in that way would be unhelpful, because it would move the exception on calling the function, which is usually not desired. What you want here, instead, is borrowing: void foo(T)(scope(T) arg) { pragma(msg, T); }That is a very good question. I do think this should show the type of T, without owned qualifier.
Nov 14 2014
On 11/11/2014 6:34 PM, deadalnix wrote:On an implementation level, a call to a pure function that return an owned could look like this : { IslandID __saved = gc_switch_new_island(); scope(exit) gc_restore_island(__saved); call_pure_function(); } This allow us to rely much less on the GC and allow for a better GC implementation.If that wrapper is automatically generated by the compiler, so the user doesn't have to mess with it, it could be workable.
Nov 13 2014
On Friday, 14 November 2014 at 01:05:13 UTC, Walter Bright wrote:On 11/11/2014 6:34 PM, deadalnix wrote:Yes, that is the intention. It means that my proposal does increase moderately the complexity of the language, but increase the complexity of the runtime (most specifically the GC) significantly.On an implementation level, a call to a pure function that return an owned could look like this : { IslandID __saved = gc_switch_new_island(); scope(exit) gc_restore_island(__saved); call_pure_function(); } This allow us to rely much less on the GC and allow for a better GC implementation.If that wrapper is automatically generated by the compiler, so the user doesn't have to mess with it, it could be workable.
Nov 13 2014
Nice article. Some observations: 1. I don't understand the difference between 'unique', 'owned' and 'isolated'. (Some other systems use 'isolated'.) Is there a difference? 2. I suspect that 'owned' does not need to be a type qualifier - it can be a storage class much like 'ref' is. This makes for a much simpler implementation. 3. Taking it a step further, I suspect that 'owned' can be implemented as a library type, ala owned!T, with only a smidgen of compiler help.
Nov 17 2014
On Monday, 17 November 2014 at 08:04:44 UTC, Walter Bright wrote:Nice article. Some observations: 1. I don't understand the difference between 'unique', 'owned' and 'isolated'. (Some other systems use 'isolated'.) Is there a difference? 2. I suspect that 'owned' does not need to be a type qualifier - it can be a storage class much like 'ref' is. This makes for a much simpler implementation. 3. Taking it a step further, I suspect that 'owned' can be implemented as a library type, ala owned!T, with only a smidgen of compiler help.Ok, I'm gonna respond in reverse order, so I can use one answer as a starting point for the next one. 3. This is not doable as a library type. The proposal interact with nogc, pure function and exception handling. There is no way to make it safe as a librayr type as well, consider: static A statica; class A { void foo() { statica = this; } } owned!A owneda = ... owneda.foo(); We want the compiler to be able to determine what is owned as much as possible and insert the right runtime magic calls to make that work. 2. I'm not sure if that is enough to make it work. It is worth investigating. It is gonna be much more limited than what I have in mind. How do you see this working with fields in struct/classes for instance ? 1. owned is simply a refined version of isolated. The reserach team comming up with isolated did it for concurrency reasons, and it is useful for this, but my proposal include a memory management aspect, that isolated lacks. So I figured out I'd come up with a better name. It is different from unique in the sense that there can be multiple references to the owned object. For instance: class A { A next; } pure auto foo() { auto a1 = new A(); auto a2 = new A(); a1.next = a2; a2.next = a1; return a1; } Here the returned reference is owned. But it is not unique (a2 also have a reference to that object). Owned indicate that you enter into a new island. Everything reached transitively from there is assumed to be in the same island unless you cross a new owned or immutable reference. That means code like: owned(A) bar() { auto a = foo(); auto next = a.next; // At this point, we have 2 local reference to the island. // Returning is an operation that consume. It would invalidate all local reference to that island. That is ok here as we are returning and not using them anyway. return a.next; } is valid. The type system here is not checking that a reference to an object is unique, but that who owns the island is know (and the island only has one owner at a time). It is a more generic and more powerful construct, and I don't think it come at extra complexity for the user compared to unique. However, it definitively means more work on our side.
Nov 17 2014
On 11/17/2014 1:57 PM, deadalnix wrote:2. I'm not sure if that is enough to make it work. It is worth investigating. It is gonna be much more limited than what I have in mind.There was a lot of skepticism initially when I proposed that ref be a storage class rather than a type constructor, but it turned out that the problems were resolvable and the end design was an unqualified win. So I think we can do this - at a minimum we need to exhaust the possibility.
Nov 17 2014
On Tuesday, 18 November 2014 at 02:39:48 UTC, Walter Bright wrote:On 11/17/2014 1:57 PM, deadalnix wrote:ref only make sense at interface boundary. That is not the same thing.2. I'm not sure if that is enough to make it work. It is worth investigating. It is gonna be much more limited than what I have in mind.There was a lot of skepticism initially when I proposed that ref be a storage class rather than a type constructor, but it turned out that the problems were resolvable and the end design was an unqualified win. So I think we can do this - at a minimum we need to exhaust the possibility.
Nov 17 2014
Finally got a look at your proposal. While I do agree with many initial statements and, specifically, proposal for heap segregation, proposed semantics of `owned` does leave me skeptical. It is effectively a more refined/mature approach to "cooking of immutables" concept and Marc proposal, while being more complicated semantic-wise, allows for much more than that.
Dec 04 2014
On Thursday, 4 December 2014 at 14:23:13 UTC, Dicebot wrote:Finally got a look at your proposal. While I do agree with many initial statements and, specifically, proposal for heap segregation, proposed semantics of `owned` does leave me skeptical. It is effectively a more refined/mature approach to "cooking of immutables" concept and Marc proposal, while being more complicated semantic-wise, allows for much more than that.I don't think this is solving the same problem as Marc's proposal so I'm not sure how comparing them make sense. Marc's proposal is about manipulating data without having ownership. This defines ownership. This proposal add some complexity, but I do think this is a winner. Right now we have various type qualifier (mutable, const, immutable, shared, const shared, inout, inout shared). This come at a cost, and, ultimately, as the proposal interact with this, you want to compare the total complexity with the total benefit. In one case, you have 7 qualifier. You get a mostly safe language out of them. With he proposal, you have 8 qualifiers. You get a safe language out of it + many opportunities to optimize + added expressiveness (ie interaction of GC and no gc code, ownership transfers, safe reference counting, extension of possibilities in nogc code in a similar fashion as weakly pure allow for extra expressiveness in pure code). If you allow me for a non politically correct metaphor, would you buy a 1$ condom with an hole in it or a 1.14$ one that do not ?
Dec 04 2014
On Thursday, 4 December 2014 at 23:22:04 UTC, deadalnix wrote:I don't think this is solving the same problem as Marc's proposal so I'm not sure how comparing them make sense. Marc's proposal is about manipulating data without having ownership. This defines ownership.Indeed. But both combined add too much complexity to be feasible and thus we need to decide what problems are more important to solve. I think one from Marc has wider application while elements of yours can be more suitable as hidden implementation detail. Though now there is also DIP69 and I need to seriously shake them all through my head before having any true opinion :PThis proposal add some complexity, but I do think this is a winner. Right now we have various type qualifier (mutable, const, immutable, shared, const shared, inout, inout shared). This come at a cost, and, ultimately, as the proposal interact with this, you want to compare the total complexity with the total benefit.I respectfully can't agree that shared qualifier truly exists in current language implementation. Thus perspective is a bit different. Also const/inout don't actually tell anything about how underlying data is managed and serve as secondary utility qualifiers.
Dec 05 2014