digitalmars.D - new should lower to a template function call
- Andrei Alexandrescu (5/5) Jul 22 2020 Was thinking about this, see https://issues.dlang.org/show_bug.cgi?id=21...
- Adam D. Ruppe (10/10) Jul 22 2020 Two thoughts on the subject:
- Walter Bright (2/8) Jul 22 2020 The compiler already does this if the variable being new`d is `scope`.
- Petar Kirov [ZombineDev] (9/19) Jul 22 2020 LDC has an optimization pass [1] which promotes heap allocations
- Petar Kirov [ZombineDev] (3/18) Jul 22 2020 ... building upon the infrastructure in src/dmd/escape.d
- Andrei Alexandrescu (3/22) Jul 23 2020 That'd be awesome. The user just types new Whatever and the compiler
- Walter Bright (5/15) Jul 23 2020 It's a good idea, but I'd consider it a low priority since it is so easy...
- Petar Kirov [ZombineDev] (5/24) Jul 23 2020 I agree that it's not a high priority. Migrating more druntime
- Per =?UTF-8?B?Tm9yZGzDtnc=?= (10/20) Jul 23 2020 I actually see 3 (instead of 2) kinds of allocations here:
- Per =?UTF-8?B?Tm9yZGzDtnc=?= (35/39) Jul 23 2020 The passing of the unittest in the following module verifies that
- Per =?UTF-8?B?Tm9yZGzDtnc=?= (12/14) Jul 23 2020 Interesting.
- Adam D. Ruppe (23/25) Jul 23 2020 Yeah, I know, I just want to make sure that this doesn't change
- Manu (3/9) Jul 22 2020 I've had this thought too. Great idea!
- jmh530 (4/11) Jul 23 2020 How does this fit in with plans for std.experimental.allocator?
- Andrei Alexandrescu (12/23) Jul 23 2020 Allocators need a champion. Ideally a good integration with the GC would...
- Ogi (6/11) Jul 24 2020 There’s emsi-containers library [1]. Also, allocators are used in
- Andre Pany (5/16) Jul 24 2020 Also libdparse and DScanner is using stdx-allocator. Dub-registry
- Atila Neves (10/38) Jul 24 2020 The main issue right now with allocators IMHO are language
- Steven Schveighoffer (17/43) Jul 24 2020 iopipe uses allocators, though I had to write my own default GC
- jmh530 (7/21) Jul 24 2020 I know mir has makeSlice and makeNdSlice that use
- Andrei Alexandrescu (7/9) Jul 24 2020 There's a lot more development beyond that (in which I may take a part,
- Stefan Koch (9/16) Jul 23 2020 As Mathias has already point out in your issue.
- Adam D. Ruppe (3/6) Jul 23 2020 There should be only one new instance per type, and it should be
- Stefan Koch (3/9) Jul 23 2020 It's anyone's guess how many types you actually have.
- Andrei Alexandrescu (30/47) Jul 23 2020 Not a problem. We must go with templates all the way, it's been that way...
- Sebastiaan Koppe (5/21) Jul 24 2020 Yes, I have wished many times that the call to druntime for
- Stefan Koch (35/89) Jul 24 2020 You say that, but we have seen problems, for example with the
- Stefan Koch (35/89) Jul 24 2020 You say that, but we have seen problems, for example with the
- Andrei Alexandrescu (17/32) Jul 24 2020 I looked at that. It's a non-issue. Yes, there's a long mangled name for...
- Seb (13/17) Jul 24 2020 +2
- Steven Schveighoffer (8/11) Jul 24 2020 Yes please! Template implementations make bugs in the runtime so much
- Johan (36/39) Jul 24 2020 Reliance on runtime type information for object creation is not
- H. S. Teoh (38/47) Jul 24 2020 I think, as with all things in language development, there's a
- Andrei Alexandrescu (2/4) Jul 24 2020 What fiasco?
- Andrei Alexandrescu (16/41) Jul 24 2020 We do exactly what you describe would be desirable. All comparisons of
- Stefan Koch (4/17) Jul 25 2020 You (and everyone else) are very welcome to help with the type
- Johan (5/14) Jul 24 2020 This we can easily test. What would it look like if there is an
- Andrei Alexandrescu (29/66) Jul 24 2020 Thanks for answering! I was deliberately exaggerating for dramatic
- Tim (7/9) Jul 24 2020 Programs compiled with -profile=gc are much faster with this
Was thinking about this, see https://issues.dlang.org/show_bug.cgi?id=21065. One problem I noticed with the current instrumentation of allocations is that it is extremely slow. https://github.com/dlang/dmd/pull/11381 takes care of that trivially and quickly because it takes advantage of defining one static variable per instantiation.
Jul 22 2020
Two thoughts on the subject: 1) it should probably always be inlined so it doesn't make new symbols in codegen. 2) the compiler should be free to cheat a little for optimization; if it can see the var never escapes, it might even still pop it on the stack (perhaps the function receives a pointer to the memory and if null, it is responsible for allocing it). This may not be implemented but the spec should at least be written to allow it later. This kind of optimization can be a real winner with scope too.
Jul 22 2020
On 7/22/2020 6:05 PM, Adam D. Ruppe wrote:2) the compiler should be free to cheat a little for optimization; if it can see the var never escapes, it might even still pop it on the stack (perhaps the function receives a pointer to the memory and if null, it is responsible for allocing it). This may not be implemented but the spec should at least be written to allow it later. This kind of optimization can be a real winner with scope too.The compiler already does this if the variable being new`d is `scope`.
Jul 22 2020
On Thursday, 23 July 2020 at 05:12:30 UTC, Walter Bright wrote:On 7/22/2020 6:05 PM, Adam D. Ruppe wrote:LDC has an optimization pass [1] which promotes heap allocations to stack allocations, without the user having to manually use the `scope` storage class for function-local variables. Do you think we could formalize this optimization and move it up the pipeline into the front-end, so that it's guaranteed to be performed by all 3 compilers? [1]: https://github.com/ldc-developers/ldc/blob/v1.23.0-beta1/gen/passes/GarbageCollect2Stack.cpp2) the compiler should be free to cheat a little for optimization; if it can see the var never escapes, it might even still pop it on the stack (perhaps the function receives a pointer to the memory and if null, it is responsible for allocing it). This may not be implemented but the spec should at least be written to allow it later. This kind of optimization can be a real winner with scope too.The compiler already does this if the variable being new`d is `scope`.
Jul 22 2020
On Thursday, 23 July 2020 at 06:13:52 UTC, Petar Kirov [ZombineDev] wrote:On Thursday, 23 July 2020 at 05:12:30 UTC, Walter Bright wrote:... building upon the infrastructure in src/dmd/escape.dOn 7/22/2020 6:05 PM, Adam D. Ruppe wrote:LDC has an optimization pass [1] which promotes heap allocations to stack allocations, without the user having to manually use the `scope` storage class for function-local variables. Do you think we could formalize this optimization and move it up the pipeline into the front-end, so that it's guaranteed to be performed by all 3 compilers? [1]: https://github.com/ldc-developers/ldc/blob/v1.23.0-beta1/gen/passes/GarbageCollect2Stack.cpp[...]The compiler already does this if the variable being new`d is `scope`.
Jul 22 2020
On 7/23/20 2:15 AM, Petar Kirov [ZombineDev] wrote:On Thursday, 23 July 2020 at 06:13:52 UTC, Petar Kirov [ZombineDev] wrote:That'd be awesome. The user just types new Whatever and the compiler decides whether to lower or simply use the stack.On Thursday, 23 July 2020 at 05:12:30 UTC, Walter Bright wrote:... building upon the infrastructure in src/dmd/escape.dOn 7/22/2020 6:05 PM, Adam D. Ruppe wrote:LDC has an optimization pass [1] which promotes heap allocations to stack allocations, without the user having to manually use the `scope` storage class for function-local variables. Do you think we could formalize this optimization and move it up the pipeline into the front-end, so that it's guaranteed to be performed by all 3 compilers? [1]: https://github.com/ldc-developers/ldc/blob/v1.23.0-beta1/gen/passes/Garb geCollect2Stack.cpp[...]The compiler already does this if the variable being new`d is `scope`.
Jul 23 2020
On 7/22/2020 11:13 PM, Petar Kirov [ZombineDev] wrote:It's a good idea, but I'd consider it a low priority since it is so easy to add scope. As for guaranteeing stack placement in the language spec, it's usually regarded as being in the QoI (Quality of Implementation) domain.The compiler already does this if the variable being new`d is `scope`.LDC has an optimization pass [1] which promotes heap allocations to stack allocations, without the user having to manually use the `scope` storage class for function-local variables. Do you think we could formalize this optimization and move it up the pipeline into the front-end, so that it's guaranteed to be performed by all 3 compilers? [1]: https://github.com/ldc-developers/ldc/blob/v1.23.0-beta1/gen/passes/Garb geCollect2Stack.cpp
Jul 23 2020
On Thursday, 23 July 2020 at 10:08:08 UTC, Walter Bright wrote:On 7/22/2020 11:13 PM, Petar Kirov [ZombineDev] wrote:I agree that it's not a high priority. Migrating more druntime hook to templates is more impactful as it unlocks various opportunities, whereas an optimization like this has a relatively fixed improvement potential.It's a good idea, but I'd consider it a low priority since it is so easy to add scope. As for guaranteeing stack placement in the language spec, it's usually regarded as being in the QoI (Quality of Implementation) domain.The compiler already does this if the variable being new`d is `scope`.LDC has an optimization pass [1] which promotes heap allocations to stack allocations, without the user having to manually use the `scope` storage class for function-local variables. Do you think we could formalize this optimization and move it up the pipeline into the front-end, so that it's guaranteed to be performed by all 3 compilers? [1]: https://github.com/ldc-developers/ldc/blob/v1.23.0-beta1/gen/passes/GarbageCollect2Stack.cpp
Jul 23 2020
On Thursday, 23 July 2020 at 10:19:41 UTC, Petar Kirov [ZombineDev] wrote:I actually see 3 (instead of 2) kinds of allocations here: 1. GC: destruction during `GC.collect()` 2. C/C++-style heap: (too large to fit on stack) scoped destruction (could be inferred?) 3. Stack: scoped destruction (inferred by LDC) The 2. is not being discussed here. Is this case obvious or irrelevant? If not which part of the runtime should handle such allocations? The GC?It's a good idea, but I'd consider it a low priority since it is so easy to add scope. As for guaranteeing stack placement in the language spec, it's usually regarded as being in the QoI (Quality of Implementation) domain.I agree that it's not a high priority. Migrating more druntime hook to templates is more impactful as it unlocks various opportunities, whereas an optimization like this has a relatively fixed improvement potential.
Jul 23 2020
On Thursday, 23 July 2020 at 12:16:55 UTC, Per Nordlöw wrote:I actually see 3 (instead of 2) kinds of allocations here: 1. GC: destruction during `GC.collect()` 2. C/C++-style heap: (too large to fit on stack) scoped destruction (could be inferred?)The passing of the unittest in the following module verifies that `scope`-qualified class variables are destructed when they go out of scope even when the GC is disabled. Nice. If the scope qualfifier is removed from `x` then the unittest fails as expected. /** Test how destruction of scoped classes is handled. * * See_Also: */ module scoped_class_dtor; bool g_dtor_called = false; class C { safe nothrow nogc: this(int x) { this.x = x; } ~this() { g_dtor_called = true; } int x; } void scopedC() safe nothrow { scope x = new C(42); } unittest { import core.memory : GC; GC.disable(); scopedC(); assert(g_dtor_called); GC.enable(); } Does this mean that the allocation of `scope`d classes is done on a thread-local heap separate from the GC-heap? I would be nice to get a reference to the places in dmd and/or druntime were this logic is defined.
Jul 23 2020
On Thursday, 23 July 2020 at 05:12:30 UTC, Walter Bright wrote:The compiler already does this if the variable being new`d is `scope`.Interesting. BTW: 1. could such `scope`d new's be allocated on a thread-local heap (instead of the global GC heap) thereby eliding the costly spinlock in the current GC implementation, given that we provide a separate lock-less interface in the GC for scoped allocations? That would make new's even faster for small sizes. 2. how could this `scope`-qualification of `new`s be inferred by the compiler using escape analysis of the `new`-ed variable? Do you any plans on implementing this for the simple cases, Walter? If it's costly could be activated in release mode?
Jul 23 2020
On Thursday, 23 July 2020 at 05:12:30 UTC, Walter Bright wrote:The compiler already does this if the variable being new`d is `scope`.Yeah, I know, I just want to make sure that this doesn't change and that the spec is written to allow it. Right now, the spec says: "NewExpressions are used to allocate memory on the garbage collected heap (default) or using a class or struct specific allocator. [...] If a NewExpression is used as an initializer for a function local variable with scope storage class, and the ArgumentList to new is empty, then the instance is allocated on the stack rather than the heap or using the class specific allocator. " So it arguably is a violation of the spec to use stack optimization without the scope keyword, and vice versa. It should really just specify it allocates and initializes the class while leaving where it does it as implementation-defined. When lowering to a template, it should also be careful to say it will not necessarily lower to the same thing; a user should NOT expect the template to actually even be called and the template is free to change its strategy. So the implementation right now is ok and I expect the lowering to a template will be equally OK. Just be careful not to specify too much.
Jul 23 2020
On Thu, Jul 23, 2020 at 10:50 AM Andrei Alexandrescu via Digitalmars-d < digitalmars-d puremagic.com> wrote:Was thinking about this, see https://issues.dlang.org/show_bug.cgi?id=21065. One problem I noticed with the current instrumentation of allocations is that it is extremely slow. https://github.com/dlang/dmd/pull/11381 takes care of that trivially and quickly because it takes advantage of defining one static variable per instantiation.I've had this thought too. Great idea!
Jul 22 2020
On Thursday, 23 July 2020 at 00:47:21 UTC, Andrei Alexandrescu wrote:Was thinking about this, see https://issues.dlang.org/show_bug.cgi?id=21065. One problem I noticed with the current instrumentation of allocations is that it is extremely slow. https://github.com/dlang/dmd/pull/11381 takes care of that trivially and quickly because it takes advantage of defining one static variable per instantiation.How does this fit in with plans for std.experimental.allocator? What are the current plans for std.experimental.allocator?
Jul 23 2020
On 7/23/20 9:34 AM, jmh530 wrote:On Thursday, 23 July 2020 at 00:47:21 UTC, Andrei Alexandrescu wrote:Allocators need a champion. Ideally a good integration with the GC would be achieved but I'm not a GC expert and don't have the time to dedicate to it. As far as I know there's little use of allocators, which is unlike C++ where there's a lot of excitement around them in spite of a much scarcer API. I recall there's a little use of allocators (copied to code.dlang.org and improved) in Mir. Not much else I've heard of. I was hoping there'd be a lot of experimentation accumulating with new allocators by now, for example to this day I have no idea whether FreeTree is any good. (It never rebalances, but I thought I'd wait until someone says the trees get lopsided... still waiting).Was thinking about this, see https://issues.dlang.org/show_bug.cgi?id=21065. One problem I noticed with the current instrumentation of allocations is that it is extremely slow. https://github.com/dlang/dmd/pull/11381 takes care of that trivially and quickly because it takes advantage of defining one static variable per instantiation.How does this fit in with plans for std.experimental.allocator? What are the current plans for std.experimental.allocator?
Jul 23 2020
On Friday, 24 July 2020 at 03:29:14 UTC, Andrei Alexandrescu wrote:As far as I know there's little use of allocators, which is unlike C++ where there's a lot of excitement around them in spite of a much scarcer API. I recall there's a little use of allocators (copied to code.dlang.org and improved) in Mir. Not much else I've heard of.There’s emsi-containers library [1]. Also, allocators are used in vibe.d in the form of dub package [2]. [1] https://github.com/dlang-community/containers [2] https://github.com/vibe-d/vibe.d/pull/1983
Jul 24 2020
On Friday, 24 July 2020 at 08:24:17 UTC, Ogi wrote:On Friday, 24 July 2020 at 03:29:14 UTC, Andrei Alexandrescu wrote:Also libdparse and DScanner is using stdx-allocator. Dub-registry is showing only dependencies but unfortunately not "where used". Kind regards AndreAs far as I know there's little use of allocators, which is unlike C++ where there's a lot of excitement around them in spite of a much scarcer API. I recall there's a little use of allocators (copied to code.dlang.org and improved) in Mir. Not much else I've heard of.There’s emsi-containers library [1]. Also, allocators are used in vibe.d in the form of dub package [2]. [1] https://github.com/dlang-community/containers [2] https://github.com/vibe-d/vibe.d/pull/1983
Jul 24 2020
On Friday, 24 July 2020 at 03:29:14 UTC, Andrei Alexandrescu wrote:On 7/23/20 9:34 AM, jmh530 wrote:The main issue right now with allocators IMHO are language changes that would allow for the writing of safe smart pointers and the like. Currently, opting out of the GC means putting up with C++-style memory safety bugs. Thankfully ldc gives us asan, but while that helps it's definitely not a silver bullet. I want to come up with a DIP but have been busy trying to fix template code emission issues , which also tie into build times.On Thursday, 23 July 2020 at 00:47:21 UTC, Andrei Alexandrescu wrote:Allocators need a champion. Ideally a good integration with the GC would be achieved but I'm not a GC expert and don't have the time to dedicate to it. As far as I know there's little use of allocators, which is unlike C++ where there's a lot of excitement around them in spite of a much scarcer API. I recall there's a little use of allocators (copied to code.dlang.org and improved) in Mir. Not much else I've heard of. I was hoping there'd be a lot of experimentation accumulating with new allocators by now, for example to this day I have no idea whether FreeTree is any good. (It never rebalances, but I thought I'd wait until someone says the trees get lopsided... still waiting).Was thinking about this, see https://issues.dlang.org/show_bug.cgi?id=21065. One problem I noticed with the current instrumentation of allocations is that it is extremely slow. https://github.com/dlang/dmd/pull/11381 takes care of that trivially and quickly because it takes advantage of defining one static variable per instantiation.How does this fit in with plans for std.experimental.allocator? What are the current plans for std.experimental.allocator?
Jul 24 2020
On 7/23/20 11:29 PM, Andrei Alexandrescu wrote:On 7/23/20 9:34 AM, jmh530 wrote:iopipe uses allocators, though I had to write my own default GC allocator, because I needed an allocator that doesn't provide blocks with SCAN set. At least in one place I used it to avoid using the heap (for a basic output pipe): https://github.com/schveiguy/iopipe/blob/0974b19e389e0c35779fa2d5b4690f775264239d/source/iopipe/bufpipe.d#L618-L633 And in action here: https://github.com/schveiguy/httpiopipe/blob/a04d87de3aa3836c07d181263c399416ba005e7c/source/iopipe/http.d#L761 I think something that might help gain more traction is to provide a mechanism to ask for typed data. Some allocators need to do things when they have types. For example, the GC can run destructors on structs when allocated properly. If allocators are not going to provide complete replacement for GC calls, then one is going to have to use conditional compilation to make it work. -SteveOn Thursday, 23 July 2020 at 00:47:21 UTC, Andrei Alexandrescu wrote:Allocators need a champion. Ideally a good integration with the GC would be achieved but I'm not a GC expert and don't have the time to dedicate to it. As far as I know there's little use of allocators, which is unlike C++ where there's a lot of excitement around them in spite of a much scarcer API. I recall there's a little use of allocators (copied to code.dlang.org and improved) in Mir. Not much else I've heard of. I was hoping there'd be a lot of experimentation accumulating with new allocators by now, for example to this day I have no idea whether FreeTree is any good. (It never rebalances, but I thought I'd wait until someone says the trees get lopsided... still waiting).Was thinking about this, see https://issues.dlang.org/show_bug.cgi?id=21065. One problem I noticed with the current instrumentation of allocations is that it is extremely slow. https://github.com/dlang/dmd/pull/11381 takes care of that trivially and quickly because it takes advantage of defining one static variable per instantiation.How does this fit in with plans for std.experimental.allocator? What are the current plans for std.experimental.allocator?
Jul 24 2020
On Friday, 24 July 2020 at 03:29:14 UTC, Andrei Alexandrescu wrote:[snip] Allocators need a champion. Ideally a good integration with the GC would be achieved but I'm not a GC expert and don't have the time to dedicate to it. As far as I know there's little use of allocators, which is unlike C++ where there's a lot of excitement around them in spite of a much scarcer API. I recall there's a little use of allocators (copied to code.dlang.org and improved) in Mir. Not much else I've heard of. I was hoping there'd be a lot of experimentation accumulating with new allocators by now, for example to this day I have no idea whether FreeTree is any good. (It never rebalances, but I thought I'd wait until someone says the trees get lopsided... still waiting).I know mir has makeSlice and makeNdSlice that use std.experimental.allocator. I can't speak to why people don't use them in D, but it is hard to compare to C++ where it is already part of the standard library.
Jul 24 2020
On 7/24/20 10:08 AM, jmh530 wrote:I can't speak to why people don't use them in D, but it is hard to compare to C++ where it is already part of the standard library.There's a lot more development beyond that (in which I may take a part, too): http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2035r0.pdf It seems there's a strong recognition within a part of the C++ community that memory allocation is an essential part of fast scalable application. Consequently, there's strong incentive to push for better allocator integration within the language.
Jul 24 2020
On Thursday, 23 July 2020 at 00:47:21 UTC, Andrei Alexandrescu wrote:Was thinking about this, see https://issues.dlang.org/show_bug.cgi?id=21065. One problem I noticed with the current instrumentation of allocations is that it is extremely slow. https://github.com/dlang/dmd/pull/11381 takes care of that trivially and quickly because it takes advantage of defining one static variable per instantiation.As Mathias has already point out in your issue. You are instantiating more templates. In the _worst_ case this can almost double the number of template instances. I.E. when the new is inside a template itself. What do you envision the prototype to be like, and why couldn't that be just a function call to the runtime if a hook exists?
Jul 23 2020
On Thursday, 23 July 2020 at 18:51:33 UTC, Stefan Koch wrote:In the _worst_ case this can almost double the number of template instances. I.E. when the new is inside a template itself.There should be only one new instance per type, and it should be trivially inlined....
Jul 23 2020
On Thursday, 23 July 2020 at 19:57:25 UTC, Adam D. Ruppe wrote:On Thursday, 23 July 2020 at 18:51:33 UTC, Stefan Koch wrote:It's anyone's guess how many types you actually have. Template code tends to create a huge number of em.In the _worst_ case this can almost double the number of template instances. I.E. when the new is inside a template itself.There should be only one new instance per type, and it should be trivially inlined....
Jul 23 2020
On 7/23/20 2:51 PM, Stefan Koch wrote:On Thursday, 23 July 2020 at 00:47:21 UTC, Andrei Alexandrescu wrote:Not a problem. We must go with templates all the way, it's been that way since the STL was created and there's no going back. All transformations of nonsense C-style crap into templates in object.d have been as many success stories. If anything there's too few templates in there. When "new C" is issued everything is right there for the grabs - everything! The instance size, the alignment, the constructor to call. Yet what do we do? We gladly throw all that on the floor to call a crummy C function that uses indirect access to get access to those. No wonder -trace=gc is so slow. This entire inefficient pomp and circumstance around creating an object is an embarrassment. To say nothing about built-in hash tables. It's 2020 and we still use an indirect call for each comparison, right? We should make a vow to fix that until the next Pandemic.Was thinking about this, see https://issues.dlang.org/show_bug.cgi?id=21065. One problem I noticed with the current instrumentation of allocations is that it is extremely slow. https://github.com/dlang/dmd/pull/11381 takes care of that trivially and quickly because it takes advantage of defining one static variable per instantiation.As Mathias has already point out in your issue. You are instantiating more templates. In the _worst_ case this can almost double the number of template instances. I.E. when the new is inside a template itself.What do you envision the prototype to be like, and why couldn't that be just a function call to the runtime if a hook exists?In the simplest form: template __new_instance(C) if (is(C == class)) { static foreach (all overloads of __ctor in C) { C __new_instance(__appropriate__parameter__set) { ... } } } This would be a terrific opportunity to fix the perfect forwarding problem. Further improvements: use introspection to detect if the class defines its own __new_instance, and if it does, defer to it. That way the deprecated feature per-class new makes a comeback the right way.
Jul 23 2020
On Friday, 24 July 2020 at 03:23:07 UTC, Andrei Alexandrescu wrote:On 7/23/20 2:51 PM, Stefan Koch wrote:Yes, I have wished many times that the call to druntime for newing an object was templated. For one that makes it easier to do without TypeInfo, but in general it gives more control.In the _worst_ case this can almost double the number of template instances. I.E. when the new is inside a template itself.Not a problem. We must go with templates all the way, it's been that way since the STL was created and there's no going back. All transformations of nonsense C-style crap into templates in object.d have been as many success stories. If anything there's too few templates in there. When "new C" is issued everything is right there for the grabs - everything! The instance size, the alignment, the constructor to call. Yet what do we do? We gladly throw all that on the floor to call a crummy C function that uses indirect access to get access to those. No wonder -trace=gc is so slow. This entire inefficient pomp and circumstance around creating an object is an embarrassment.
Jul 24 2020
On Friday, 24 July 2020 at 03:23:07 UTC, Andrei Alexandrescu wrote:On 7/23/20 2:51 PM, Stefan Koch wrote:You say that, but we have seen problems, for example with the mangle-length of huge stringswitch instances. Shortly after the introduction array comparison was bugged because, it could not compare char[] and const char[]. They were different types and would compare to false, therefore. Where are the many success stories? I haven't read any article about how the templates in druntime have helped anyone in great capacity, (to be honest I have not looked either).On Thursday, 23 July 2020 at 00:47:21 UTC, Andrei Alexandrescu wrote:Not a problem. We must go with templates all the way, it's been that way since the STL was created and there's no going back. All transformations of nonsense C-style crap into templates in object.d have been as many success stories. If anything there's too few templates in there.Was thinking about this, see https://issues.dlang.org/show_bug.cgi?id=21065. One problem I noticed with the current instrumentation of allocations is that it is extremely slow. https://github.com/dlang/dmd/pull/11381 takes care of that trivially and quickly because it takes advantage of defining one static variable per instantiation.As Mathias has already point out in your issue. You are instantiating more templates. In the _worst_ case this can almost double the number of template instances. I.E. when the new is inside a template itself.When "new C" is issued everything is right there for the grabs - everything! The instance size, the alignment, the constructor to call. Yet what do we do? We gladly throw all that on the floor to call a crummy C function that uses indirect access to get access to those. No wonder -trace=gc is so slow. This entire inefficient pomp and circumstance around creating an object is an embarrassment.gc tracing is slow because it's not a priority to make it fast. Every compiler which does inline-ing properly, can get around the indirect calls.To say nothing about built-in hash tables. It's 2020 and we still use an indirect call for each comparison, right? We should make a vow to fix that until the next Pandemic.The builtin-hash tables, are not cool, true. But that's not because of cmps.You are aware that this will create a lot of work for semantic, yes? Whereas just emitting a call, would be O(1), fast and most of all, predictable. Making this a template invites the error messages or templates as well. What if the constraint fails, you get "Could not find a matching overload for '__new_instance'". Whereas a solution which uses a function call, could give a more informative error message.What do you envision the prototype to be like, and why couldn't that be just a function call to the runtime if a hook exists?In the simplest form: template __new_instance(C) if (is(C == class)) { static foreach (all overloads of __ctor in C) { C __new_instance(__appropriate__parameter__set) { ... } } } This would be a terrific opportunity to fix the perfect forwarding problem.Further improvements: use introspection to detect if the class defines its own __new_instance, and if it does, defer to it. That way the deprecated feature per-class new makes a comeback the right way.I wonder why the feature was deprecated in the first place. Actually it's not very costly to simply allow two ways of customizing 'opNew'. If that were the case one could even do direct comparisons of the template vs. the function approach. D has always prided itself on giving the decision to the user, I don't see any compelling reason not to do it here. Regards, Stefan
Jul 24 2020
On Friday, 24 July 2020 at 03:23:07 UTC, Andrei Alexandrescu wrote:On 7/23/20 2:51 PM, Stefan Koch wrote:You say that, but we have seen problems, for example with the mangle-length of huge stringswitch instances. Shortly after the introduction array comparison was bugged because, it could not compare char[] and const char[]. They were different types and would compare to false, therefore. Where are the many success stories? I haven't read any article about how the templates in druntime have helped anyone in great capacity, (to be honest I have not looked either).On Thursday, 23 July 2020 at 00:47:21 UTC, Andrei Alexandrescu wrote:Not a problem. We must go with templates all the way, it's been that way since the STL was created and there's no going back. All transformations of nonsense C-style crap into templates in object.d have been as many success stories. If anything there's too few templates in there.Was thinking about this, see https://issues.dlang.org/show_bug.cgi?id=21065. One problem I noticed with the current instrumentation of allocations is that it is extremely slow. https://github.com/dlang/dmd/pull/11381 takes care of that trivially and quickly because it takes advantage of defining one static variable per instantiation.As Mathias has already point out in your issue. You are instantiating more templates. In the _worst_ case this can almost double the number of template instances. I.E. when the new is inside a template itself.When "new C" is issued everything is right there for the grabs - everything! The instance size, the alignment, the constructor to call. Yet what do we do? We gladly throw all that on the floor to call a crummy C function that uses indirect access to get access to those. No wonder -trace=gc is so slow. This entire inefficient pomp and circumstance around creating an object is an embarrassment.gc tracing is slow because it's not a priority to make it fast. Every compiler which does inline-ing properly, can get around the indirect calls.To say nothing about built-in hash tables. It's 2020 and we still use an indirect call for each comparison, right? We should make a vow to fix that until the next Pandemic.The builtin-hash tables, are not cool, true. But that's not because of cmps.You are aware that this will create a lot of work for semantic, yes? Whereas just emitting a call, would be O(1), fast and most of all, predictable. Making this a template invites the error messages or templates as well. What if the constraint fails, you get "Could not find a matching overload for '__new_instance'". Whereas a solution which uses a function call, could give a more informative error message.What do you envision the prototype to be like, and why couldn't that be just a function call to the runtime if a hook exists?In the simplest form: template __new_instance(C) if (is(C == class)) { static foreach (all overloads of __ctor in C) { C __new_instance(__appropriate__parameter__set) { ... } } } This would be a terrific opportunity to fix the perfect forwarding problem.Further improvements: use introspection to detect if the class defines its own __new_instance, and if it does, defer to it. That way the deprecated feature per-class new makes a comeback the right way.I wonder why the feature was deprecated in the first place. Actually it's not very costly to simply allow two ways of customizing 'opNew'. If that were the case one could even do direct comparisons of the template vs. the function approach. D has always prided itself on giving the decision to the user, I don't see any compelling reason not to do it here. Regards, Stefan
Jul 24 2020
On 7/24/20 4:58 AM, Stefan Koch wrote:On Friday, 24 July 2020 at 03:23:07 UTC, Andrei Alexandrescu wrote:I looked at that. It's a non-issue. Yes, there's a long mangled name for a switch statement with strings. It scales up with the number of switch statements (with strings) and the number of strings. It's called once. And that's about it. It's easy to improve (e.g. by hashing) but there's no need to.Not a problem. We must go with templates all the way, it's been that way since the STL was created and there's no going back. All transformations of nonsense C-style crap into templates in object.d have been as many success stories. If anything there's too few templates in there.You say that, but we have seen problems, for example with the mangle-length of huge stringswitch instances.Shortly after the introduction array comparison was bugged because, it could not compare char[] and const char[]. They were different types and would compare to false, therefore.So there was a bug in the implementation. I recall there was (and there may still be) a problem with things like array comparison because the compiler did it in different ad-hoc ways depending on compile- vs. run-time execution. This would be something very worth looking into.Where are the many success stories?Lucia demonstrated dramatic improvements on array comparison speed: https://www.youtube.com/watch?v=P6ZST80BCIg.I haven't read any article about how the templates in druntime have helped anyone in great capacity, (to be honest I have not looked either).Problem is there are not enough. We need to template all we can of druntime. All those silly C-style functions decaying to void* and dependencies and runtime type information must go.
Jul 24 2020
On Friday, 24 July 2020 at 14:06:28 UTC, Andrei Alexandrescu wrote:Problem is there are not enough.+2We need to template all we can of druntime. All those silly C-style functions decaying to void* and dependencies and runtime type information must go.<3 One very important upsides that hasn't even been mentioned is that the compiler (+ the C-style functions) are plainly lying to us as they aren't properly type-checked. The compiler assumes that they are e.g. ` nogc safe pure nothrow`, but this is not always true. This alone should be motivation enough. Further details regarding the breakout of the type system can be found in Dan's GSoC19 project: https://github.com/dlang/projects/issues/25
Jul 24 2020
On 7/24/20 10:06 AM, Andrei Alexandrescu wrote:We need to template all we can of druntime. All those silly C-style functions decaying to void* and dependencies and runtime type information must go.Yes please! Template implementations make bugs in the runtime so much easier to fix, and make features easier to add. Not to mention all the inference we lose when we don't do it this way. Consider all the features added to AAs via templates. Even though they have to be thin wrappers around the C-style functions, it allows much better control over which calls are valid. -Steve
Jul 24 2020
On Friday, 24 July 2020 at 14:06:28 UTC, Andrei Alexandrescu wrote:We need to template all we can of druntime. All those silly C-style functions decaying to void* and dependencies and runtime type information must go.Reliance on runtime type information for object creation is not good, I agree. But that is easily solved by having the compiler make a call to the ctor (instead of implementing that call in druntime). Another option is to templatize, as you wrote. I don't agree that everything should be moved to templates in druntime, but I cannot formulate a strong argument against it (I also have not seen strong arguments in favor of it though). Because "careful consideration" is not a strong point of D development, I am wary. Templatizing will make certain types of functionality _harder_. If a library/program wants to break, or hook into, object creation inside user code, it will be impossible with templatized functions (you cannot predict the templated function names of code that has not yet been written). For example, AddressSanitizer can only work with non-templatized allocation functions. If malloc was a template, it would be _much_ harder to implement AddressSanitizer support (and currently not possible while keeping source location in error messages). In favor of compiler-implemented code (instead of druntime) is that the compiler is able to add debug information to generated code, linking the instructions to source location in user code. I think this will only be possible for druntime-implemented when: a) the code is force-inlined and b) there is a way to disable debug information tagging of the instructions inside the druntime-implemented function (such that the source-location stays the same as the user's code). (a) is already available in our compilers, (b) is not yet. Template bloat is a standard concern, because we do not have any defense against it. Did anyone ever look into the array comparison case with template bloat? Force-inlining _and_ not emitting into binary (currently not available to user/druntime code) would fix that, although then maybe someone else starts to argue about instruction count increase... -Johan
Jul 24 2020
On Fri, Jul 24, 2020 at 05:14:47PM +0000, Johan via Digitalmars-d wrote:On Friday, 24 July 2020 at 14:06:28 UTC, Andrei Alexandrescu wrote:[...]We need to template all we can of druntime. All those silly C-style functions decaying to void* and dependencies and runtime type information must go.I don't agree that everything should be moved to templates in druntime, but I cannot formulate a strong argument against it (I also have not seen strong arguments in favor of it though). Because "careful consideration" is not a strong point of D development, I am wary.I think, as with all things in language development, there's a trade-off. Templates are not a silver bullet. For example, the string-switch fiasco that somebody referred to. Ultimately, IIRC, that was worked around by rewriting std.datetime to refactor away that giant 1000+ (or was it 5000+) case switch statement -- but that did not fix the real problem, which is that when you have a very large number of template arguments, it just introduces a lot of problems, both in the compiler (performance, size of generated symbols, etc) and in the binary (ridiculous size of symbols, template bloat). Template bloat is also a cause of concern: if your program had arrays of 100 distinct types, but they are all PODs and therefore comparison could be done with memcmp()'s, then it makes little sense for the compiler to instantiate the array comparison template 100 times and emit 100 copies of code that ultimately is semantically identical to memcmp. Ditto for any other array operation that you have have. Force-inlining only helps to a certain extent -- as Johan said, you merely end up with instruction count bloat in lieu of template function bloat. At some point, the cost of having an overly-large executable will start to outweigh the (small!) performance hit of replacing instruction count bloat with a function call to memcmp, for example. Another problem is error messages, as someone else has already pointed out. Although we have seen improvements in template-related error messages, they are still very user-unfriendly, often coming in the form of indirect messages like "this call does not match any of the following 25 overloads", and the user has to sift through pages upon pages of indecipherably-long template symbols and somehow deduce that what the compiler *really* want to say was "you have a typo on line 123". I think part of the solution is to adopt Stefan's proposed type functions instead of shoe-horning everything into templates. Just because templates are Turing-complete does not necessarily mean they are practical for implementing every computation. A lot of the druntime templates / prospective druntime templates could possibly be implemented as type functions rather than template functions. T -- Error: Keyboard not attached. Press F1 to continue. -- Yoon Ha Lee, CONLANG
Jul 24 2020
On 7/24/20 2:07 PM, H. S. Teoh wrote:For example, the string-switch fiasco that somebody referred to.What fiasco?
Jul 24 2020
On 7/24/20 2:07 PM, H. S. Teoh wrote:Template bloat is also a cause of concern: if your program had arrays of 100 distinct types, but they are all PODs and therefore comparison could be done with memcmp()'s, then it makes little sense for the compiler to instantiate the array comparison template 100 times and emit 100 copies of code that ultimately is semantically identical to memcmp.We do exactly what you describe would be desirable. All comparisons of arrays of scalars boil down to a small code: https://github.com/dlang/druntime/blob/master/src/core/internal/array/comparison.d#L50 See generated code: https://godbolt.org/z/4hjWxq (dmd does not inline that, hopefully it will in the next release. Does not dilute my argument.)Ditto for any other array operation that you have have. Force-inlining only helps to a certain extent -- as Johan said, you merely end up with instruction count bloat in lieu of template function bloat. At some point, the cost of having an overly-large executable will start to outweigh the (small!) performance hit of replacing instruction count bloat with a function call to memcmp, for example.As I said, with templates you have precise control over merging implementations, inlining vs. not, certain optimizations etc. Without templates - you don't. It's as simple as that. To paraphrase Steven Wright, going non-templates is a one-way dead-end street.Another problem is error messages, as someone else has already pointed out. Although we have seen improvements in template-related error messages, they are still very user-unfriendly, often coming in the form of indirect messages like "this call does not match any of the following 25 overloads", and the user has to sift through pages upon pages of indecipherably-long template symbols and somehow deduce that what the compiler *really* want to say was "you have a typo on line 123".There is great opportunity to improve compilers, some of which has been reaped, and much of which is still waiting.I think part of the solution is to adopt Stefan's proposed type functions instead of shoe-horning everything into templates. Just because templates are Turing-complete does not necessarily mean they are practical for implementing every computation. A lot of the druntime templates / prospective druntime templates could possibly be implemented as type functions rather than template functions.That'd be interesting. In the meantime, there's so much to do with the current template offering it hurts.
Jul 24 2020
On Friday, 24 July 2020 at 18:45:39 UTC, Andrei Alexandrescu wrote:On 7/24/20 2:07 PM, H. S. Teoh wrote:You (and everyone else) are very welcome to help with the type functions.I think part of the solution is to adopt Stefan's proposed type functions instead of shoe-horning everything into templates. Just because templates are Turing-complete does not necessarily mean they are practical for implementing every computation. A lot of the druntime templates / prospective druntime templates could possibly be implemented as type functions rather than template functions.That'd be interesting. In the meantime, there's so much to do with the current template offering it hurts.
Jul 25 2020
On Friday, 24 July 2020 at 18:07:21 UTC, H. S. Teoh wrote:Another problem is error messages, as someone else has already pointed out. Although we have seen improvements in template-related error messages, they are still very user-unfriendly, often coming in the form of indirect messages like "this call does not match any of the following 25 overloads", and the user has to sift through pages upon pages of indecipherably-long template symbols and somehow deduce that what the compiler *really* want to say was "you have a typo on line 123".This we can easily test. What would it look like if there is an error in a user class constructor when it is called by a druntime function versus currently? -Johan
Jul 24 2020
On 7/24/20 1:14 PM, Johan wrote:On Friday, 24 July 2020 at 14:06:28 UTC, Andrei Alexandrescu wrote:Thanks for answering! I was deliberately exaggerating for dramatic purposes, so it's great to see such an even-keeled analytical response.We need to template all we can of druntime. All those silly C-style functions decaying to void* and dependencies and runtime type information must go.Reliance on runtime type information for object creation is not good, I agree. But that is easily solved by having the compiler make a call to the ctor (instead of implementing that call in druntime). Another option is to templatize, as you wrote.I don't agree that everything should be moved to templates in druntime, but I cannot formulate a strong argument against it (I also have not seen strong arguments in favor of it though). Because "careful consideration" is not a strong point of D development, I am wary. Templatizing will make certain types of functionality _harder_. If a library/program wants to break, or hook into, object creation inside user code, it will be impossible with templatized functions (you cannot predict the templated function names of code that has not yet been written). For example, AddressSanitizer can only work with non-templatized allocation functions. If malloc was a template, it would be _much_ harder to implement AddressSanitizer support (and currently not possible while keeping source location in error messages).I think there's a simple pro-template argument that I've run into often in my C++ projects: template to non-template is a one-way street because from static type information to non-static type information is a one-way street. You can go from a templated approach to a non-templated approach as easy as a one-liner. You can go from static composition to dynamic. You can do type erasure but not type "reconstruction". Going the other way is much more difficult for obvious reasons. That's why deferring the decision to lose type information is often a good stance for a designer; it offers maximum flexibility because you can easily revisit and override it. (This argument pattern goes for other things, too: making a fast subsystem safe vs. the converse comes to mind.) Applied to druntime, if using runtime functions is desirable, the templates can just forward to them. So once we decide to use templates for fundamental runtime support, changing our mind selectively is trivial. Going the opposite way (which is what we're doing now) is a difficult path with compiler changes and all.In favor of compiler-implemented code (instead of druntime) is that the compiler is able to add debug information to generated code, linking the instructions to source location in user code. I think this will only be possible for druntime-implemented when: a) the code is force-inlined and b) there is a way to disable debug information tagging of the instructions inside the druntime-implemented function (such that the source-location stays the same as the user's code). (a) is already available in our compilers, (b) is not yet. Template bloat is a standard concern, because we do not have any defense against it. Did anyone ever look into the array comparison case with template bloat? Force-inlining _and_ not emitting into binary (currently not available to user/druntime code) would fix that, although then maybe someone else starts to argue about instruction count increase...I'm tuned to some of the C++ standardization mailing lists. Template bloat has come into discussion about every template-related feature addition since 1990. Occasionally with numbers. Over time these arguments died out: the ease of opting out of templates where appropriate and the many tactical improvements made by compiler writers have simply sent that argument to the dustbin of history. All C++ features of any consequence since decades are imbued with genericity. It has been undeniable that going templated all the way has worked wonders for C++. We'd do good to learn from that.
Jul 24 2020
On Thursday, 23 July 2020 at 00:47:21 UTC, Andrei Alexandrescu wrote:One problem I noticed with the current instrumentation of allocations is that it is extremely slow.Programs compiled with -profile=gc are much faster with this change: https://github.com/dlang/druntime/pull/3164 The problem was, that GC.stats was called for every allocation, but GC.stats does more than necessary for -profile=gc. Unfortunately this change breaks the ABI of druntime.
Jul 24 2020