digitalmars.D - New pointer type for GC
- Etienne Cimon (23/23) May 26 2014 I've been looking at the GC and found that the main problem is that
- Brian Schott (3/6) May 26 2014 I don't think we want to do that.
- Etienne Cimon (4/10) May 26 2014 Ah, maybe I wasn't clear but this meant that if the void' is not used
- Etienne Cimon (3/6) May 26 2014 Forgot to mention, but this not only avoids stopping the whole world,
- Daniel Murphy (2/4) May 26 2014 Sounds like never gonna happen.
- Etienne Cimon (4/8) May 26 2014 In terms of logic it's not that complicated, I could change DMD,
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (10/14) May 27 2014 On 64 bit platform 8 bytes is sufficient if you control the
- Etienne (6/14) May 27 2014 That's true, though you still need the thread ID for references to
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (5/7) May 27 2014 I am not really sure how useful references to gc-pointers is. I
- Etienne (15/22) May 27 2014 I think the GC is the future of D considering it's embedded to the very
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (7/16) May 27 2014 Well, but then I think you should be required to do manual
- Etienne (13/29) May 27 2014 You're right, it's obviously easier to keep it as the same pointer
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (7/10) May 27 2014 That is an option, and having a hijacked malloc would probably
- Bastiaan Veelo (4/18) May 28 2014 This would also help implementing weak references, am I right?
- Dicebot (3/3) May 28 2014 Big language change which does not fix any fundamental issue. I
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (5/6) May 28 2014 Having a GC pointer type + several other mechanisms could reduce
- Dicebot (11/17) May 28 2014 No this is simply annoying problem. We are unlikely to break
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (7/10) May 28 2014 A niche solution is fine by me. Etienne has expressed interest in
- Dicebot (4/14) May 28 2014 Get this into upstream and you will have dozens unhappy about
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (11/13) May 28 2014 It doesn't have to be in upstream. It could be an experimental
- Manu via Digitalmars-d (2/12) May 31 2014 I would switch to a D fork in an instant if it satisfied my requirements...
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (14/16) Jun 04 2014 Maybe the best approach is to
- Sean Kelly (4/17) Jun 04 2014 I think in SafeD it might be possible to just make this
I've been looking at the GC and found that the main problem is that there's no clear information about the pointers. At least smart pointers have some info inside them but GC pointers are completely plain 4-8 bytes and nothing else. The GC is very fast even if it needs to lookup this info but I believe it wouldn't stay low-cpu in a 128 GB of RAM server with 3GB/s of memory traffic with a wide range of memory segment sizes. I think a decent proposal would be to 1- Introduce a new GC pointer type, e.g. a void' (its an apostrophee) used also in classes which implicitely converts to void* by removing the last bytes (which ontain the info). This pointer contains the Pool ID of the underlying memory 2- For reference pointers to a GC pointer, &void' would add a thread ID and magic number to better identify them during collection, and to avoid stopping the whole world to dereference them 3- The space (4 bytes?) added by the new pointer size could be saved with tighter storage bins. E.g. No more storing 65 bytes in a 128 byte bin, but the bucket would go from array to AVL Tree, which is a decent trade-off for all the O(1) searches during collection. The downsides of it is that adding roots would force falling back on the previous/slower searches, so it's either GC or no GC. Also, everything in D would become a ' pointer rather than * (which would then be legacy) I think everything everywhere would have to change for this to be possible.
May 26 2014
On Tuesday, 27 May 2014 at 02:52:48 UTC, Etienne Cimon wrote:void' (its an apostrophee)You mean the beginning of a character literal?I think everything everywhere would have to change for this to be possible.I don't think we want to do that.
May 26 2014
On 2014-05-26 23:15, Brian Schott wrote:On Tuesday, 27 May 2014 at 02:52:48 UTC, Etienne Cimon wrote:Ah, maybe I wasn't clear but this meant that if the void' is not used entirely everywhere, it falls back on the old GC algorithms. This isn't breaking, it's a fully backwards-compatible ideavoid' (its an apostrophee)You mean the beginning of a character literal?I think everything everywhere would have to change for this to be possible.I don't think we want to do that.
May 26 2014
On 2014-05-26 22:52, Etienne Cimon wrote:2- For reference pointers to a GC pointer, &void' would add a thread ID and magic number to better identify them during collection, and to avoid stopping the whole world to dereference themForgot to mention, but this not only avoids stopping the whole world, but also allows parallel collection (multi-threading).
May 26 2014
"Etienne Cimon" wrote in message news:lm0um0$tgh$1 digitalmars.com...I think everything everywhere would have to change for this to be possible.Sounds like never gonna happen.
May 26 2014
On 2014-05-26 23:19, Daniel Murphy wrote:"Etienne Cimon" wrote in message news:lm0um0$tgh$1 digitalmars.com...In terms of logic it's not that complicated, I could change DMD, druntime, phobos myself for it. The main problem is that the apostrophe is really, a major cultural changeI think everything everywhere would have to change for this to be possible.Sounds like never gonna happen.
May 26 2014
On Tuesday, 27 May 2014 at 03:26:33 UTC, Etienne Cimon wrote:On 2014-05-26 23:19, Daniel Murphy wrote:Please, no apostrophe. It'll mess syntax highlighters, and possible indenters."Etienne Cimon" wrote in message news:lm0um0$tgh$1 digitalmars.com...In terms of logic it's not that complicated, I could change DMD, druntime, phobos myself for it. The main problem is that the apostrophe is really, a major cultural changeI think everything everywhere would have to change for this to be possible.Sounds like never gonna happen.
May 27 2014
On 2014-05-27 9:52 AM, Idan Arye wrote:Please, no apostrophe. It'll mess syntax highlighters, and possible indenters.assert(sizeof(ptr) == size_t + 3); assert(szptr_t == sizeof(ptr)); szptr_t ptr2Val = cast(szptr_t) &ptr; char magicNum = ptr2.magic; dchar threadId = ptr2.thread; char[3] poolId = ptr.pool;
May 27 2014
On Tuesday, 27 May 2014 at 02:52:48 UTC, Etienne Cimon wrote:I've been looking at the GC and found that the main problem is that there's no clear information about the pointers. At least smart pointers have some info inside them but GC pointers are completely plain 4-8 bytes and nothing else.On 64 bit platform 8 bytes is sufficient if you control the allocator: 1. Avoid allocating non-GC memory from specific address range. 2. Set a max-size for GC allocated objects. Then the test becomes this: if ((ptr & NONGCMASK)==0){ heapinfo_ptr = ptr&MASK; // process ptr }
May 27 2014
On 2014-05-27 3:56 AM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang gmail.com>" wrote:On 64 bit platform 8 bytes is sufficient if you control the allocator: 1. Avoid allocating non-GC memory from specific address range. 2. Set a max-size for GC allocated objects. Then the test becomes this: if ((ptr & NONGCMASK)==0){ heapinfo_ptr = ptr&MASK; // process ptr }That's true, though you still need the thread ID for references to pointers and you need to be able to pass those pointers to C. If that's done manually, you end up sanitizing the pointers too often, it becomes boilerplate.
May 27 2014
On Tuesday, 27 May 2014 at 13:58:26 UTC, Etienne wrote:That's true, though you still need the thread ID for references to pointers and you need to be able to pass those pointers to C.I am not really sure how useful references to gc-pointers is. I certainly would trade them in for multiple return values. I also think it is reasonable to ban transfer of GC mem to C code if all GC mem is accounted for with gc-typed pointers...
May 27 2014
On 2014-05-27 10:18 AM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang gmail.com>" wrote:On Tuesday, 27 May 2014 at 13:58:26 UTC, Etienne wrote:I think the GC is the future of D considering it's embedded to the very core of the language, and compatibility with C code is ... elementary. Also, thread IDs in ptr references are to the GC as ref counts are to the smart pointers. If you remove the refCount from smart pointers, you end up scanning the whole memory to count them don't you? So then, why remove the thread ID from GC references, if only to look for them in each thread? You slow the GC down by as much total memory there is in all threads vs the avg in a thread, AND you remove parallel collection - by not having the Thread ID in gc ptr references So you understand that's exactly why the GC has to stop the world, and no gaming platform will ever turn to the default behavior of a language if it stops its world. As a matter of fact, I can't see any other way of fixing the GC than adding the Thread ID in there :/That's true, though you still need the thread ID for references to pointers and you need to be able to pass those pointers to C.I am not really sure how useful references to gc-pointers is. I certainly would trade them in for multiple return values. I also think it is reasonable to ban transfer of GC mem to C code if all GC mem is accounted for with gc-typed pointers...
May 27 2014
On Tuesday, 27 May 2014 at 14:42:34 UTC, Etienne wrote:I think the GC is the future of D considering it's embedded to the very core of the language, and compatibility with C code is ... elementary.Well, but then I think you should be required to do manual tracking while it is being retained by C code. Basically a ref counter that keeps it marked reachable by the gc until released.You slow the GC down by as much total memory there is in all threads vs the avg in a thread, AND you remove parallel collection - by not having the Thread ID in gc ptr referencesNot if you restrict the gc heap to a set of blocks. You can also keep thread info in the heap memoryblock.behavior of a language if it stops its world. As a matter of fact, I can't see any other way of fixing the GC than adding the Thread ID in there :/By having multiple local GCs?
May 27 2014
On 2014-05-27 10:54 AM, "Ola Fosheim Grøstad" <ola.fosheim.grostad+dlang gmail.com>" wrote:On Tuesday, 27 May 2014 at 14:42:34 UTC, Etienne wrote:You're right, it's obviously easier to keep it as the same pointer syntax but hijack the stdlib malloc functions to forcibly go through the GC. If the GC controls everything, you can keep the info in 8 byte pointers. - The GC always returns an libc-incompatible pointer value - Dereferencing should call the resolver from the GC to process it with the libc-compatible value (possibly just removing the last couple bytes) - Sending a pointer through extern(C) should call the sanitizer which resolves the real pointer through the GC and sends that - The last bytes of a pointer would contain thread ID for void** and poolID for void* - This would only work on x64 platformsI think the GC is the future of D considering it's embedded to the very core of the language, and compatibility with C code is ... elementary.Well, but then I think you should be required to do manual tracking while it is being retained by C code. Basically a ref counter that keeps it marked reachable by the gc until released.You slow the GC down by as much total memory there is in all threads vs the avg in a thread, AND you remove parallel collection - by not having the Thread ID in gc ptr referencesNot if you restrict the gc heap to a set of blocks. You can also keep thread info in the heap memoryblock.behavior of a language if it stops its world. As a matter of fact, I can't see any other way of fixing the GC than adding the Thread ID in there :/By having multiple local GCs?
May 27 2014
On Tuesday, 27 May 2014 at 16:47:38 UTC, Etienne wrote:You're right, it's obviously easier to keep it as the same pointer syntax but hijack the stdlib malloc functions to forcibly go through the GC.That is an option, and having a hijacked malloc would probably also make it possible to optimize out uneccessary allocations as well as inline allocations. If you have a GC-pointer type and ban transitions to regular pointers unless they are borrowed pointers then that would also be ok (you don't need to hijack malloc then).
May 27 2014
On Tuesday, 27 May 2014 at 16:47:38 UTC, Etienne wrote:You're right, it's obviously easier to keep it as the same pointer syntax but hijack the stdlib malloc functions to forcibly go through the GC. If the GC controls everything, you can keep the info in 8 byte pointers. - The GC always returns an libc-incompatible pointer value - Dereferencing should call the resolver from the GC to process it with the libc-compatible value (possibly just removing the last couple bytes) - Sending a pointer through extern(C) should call the sanitizer which resolves the real pointer through the GC and sends that - The last bytes of a pointer would contain thread ID for void** and poolID for void* - This would only work on x64 platformsThis would also help implementing weak references, am I right? Which then come in handy when improving std.signals? Bastiaan.
May 28 2014
Big language change which does not fix any fundamental issue. I think at stage of language development it is better to not even discuss those ;)
May 28 2014
On Wednesday, 28 May 2014 at 14:16:56 UTC, Dicebot wrote:Big language change which does not fix any fundamental issue.Having a GC pointer type + several other mechanisms could reduce the amount of scanned memory to a level where it slips below the "pain threshold" for interactive apps. That's a fundamental issue for anything that is not batch.
May 28 2014
On Wednesday, 28 May 2014 at 17:21:18 UTC, Ola Fosheim Grøstad wrote:On Wednesday, 28 May 2014 at 14:16:56 UTC, Dicebot wrote:No this is simply annoying problem. We are unlikely to break anything to fix annoyances (even huge ones). Fundamental issues in my opinion are those that result in type system holes or make certain common/desired code impossible without resorting to lot of assembly magic. Or areas that are complicated beyond explainable. Adding GC pointer type does not enable anything that you can't do write now for high-level applications and does not help at all low-level applications. It is niche solution.Big language change which does not fix any fundamental issue.Having a GC pointer type + several other mechanisms could reduce the amount of scanned memory to a level where it slips below the "pain threshold" for interactive apps. That's a fundamental issue for anything that is not batch.
May 28 2014
On Wednesday, 28 May 2014 at 17:27:20 UTC, Dicebot wrote:Adding GC pointer type does not enable anything that you can't do write now for high-level applications and does not help at all low-level applications. It is niche solution.A niche solution is fine by me. Etienne has expressed interest in creating a D to asm.js converter. Now, maybe the current D is not suitable for that, but perhaps a dialect of D would be. I could back that. That makes two who are interested. Add 2-3 more people and we could have a train going to a station that is niche… but productive.
May 28 2014
On Wednesday, 28 May 2014 at 17:35:18 UTC, Ola Fosheim Grøstad wrote:On Wednesday, 28 May 2014 at 17:27:20 UTC, Dicebot wrote:Get this into upstream and you will have dozens unhappy about updating for their code. It is never that simple.Adding GC pointer type does not enable anything that you can't do write now for high-level applications and does not help at all low-level applications. It is niche solution.A niche solution is fine by me. Etienne has expressed interest in creating a D to asm.js converter. Now, maybe the current D is not suitable for that, but perhaps a dialect of D would be. I could back that. That makes two who are interested. Add 2-3 more people and we could have a train going to a station that is niche… but productive.
May 28 2014
On Wednesday, 28 May 2014 at 17:39:21 UTC, Dicebot wrote:Get this into upstream and you will have dozens unhappy about updating for their code. It is never that simple.It doesn't have to be in upstream. It could be an experimental compiler implemented in pure D. I would back that, if Etienne is willing. There is a need for a strong typed language that can compile both to asm.js, PNACL and to machine language. I doubt that the current incarnation of D is suitable, so a reduced set of D with better GC/malloc support and tight codegen for small downloads would be welcome for interactive apps. I agree with you that D2 won't get this, but that does not prevent someone from creating D-.
May 28 2014
On 29 May 2014 03:35, via Digitalmars-d <digitalmars-d puremagic.com> wrote:On Wednesday, 28 May 2014 at 17:27:20 UTC, Dicebot wrote:I would switch to a D fork in an instant if it satisfied my requirements...Adding GC pointer type does not enable anything that you can't do write now for high-level applications and does not help at all low-level applications. It is niche solution.A niche solution is fine by me. Etienne has expressed interest in creating a D to asm.js converter. Now, maybe the current D is not suitable for that, but perhaps a dialect of D would be. I could back that. That makes two who are interested. Add 2-3 more people and we could have a train going to a station that is niche… but productive.
May 31 2014
On Saturday, 31 May 2014 at 13:44:22 UTC, Manu via Digitalmars-d wrote:I would switch to a D fork in an instant if it satisfied my requirements...Maybe the best approach is to 1. Start with the basic requirements and acceptable restrictions and shrink the semantics to fit them. Then use the DScanner source code as a starting point and emit LLVM asm, and make sure the runtime fits with the restrictions of PNACL and asm.js. 2. Then accept new semantics/syntax and rather provide source2source converters with warnings from D, Swift etc. Safari is currently working on improving their JS compiler with LLVM tech so asm.js might perform good on all platforms eventually. Here is one benchmark based on Box2D: http://www.j15r.com/blog/2014/05/23/Box2d_2014_Update (What are your "absolute" requirements?)
Jun 04 2014
On Tuesday, 27 May 2014 at 02:52:48 UTC, Etienne Cimon wrote:I've been looking at the GC and found that the main problem is that there's no clear information about the pointers. At least smart pointers have some info inside them but GC pointers are completely plain 4-8 bytes and nothing else. The GC is very fast even if it needs to lookup this info but I believe it wouldn't stay low-cpu in a 128 GB of RAM server with 3GB/s of memory traffic with a wide range of memory segment sizes. I think a decent proposal would be to 1- Introduce a new GC pointer type, e.g. a void' (its an apostrophee) used also in classes which implicitely converts to void* by removing the last bytes (which ontain the info). This pointer contains the Pool ID of the underlying memoryI think in SafeD it might be possible to just make this automatic. I don't see it ever happening in D proper though. What happens if I pass a dynamic array of pointers to memset?
Jun 04 2014