www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - new should lower to a template function call

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Was thinking about this, see https://issues.dlang.org/show_bug.cgi?id=21065.

One problem I noticed with the current instrumentation of allocations is 
that it is extremely slow. https://github.com/dlang/dmd/pull/11381 takes 
care of that trivially and quickly because it takes advantage of 
defining one static variable per instantiation.
Jul 22 2020
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
Two thoughts on the subject:

1) it should probably always be inlined so it doesn't make new 
symbols in codegen.

2) the compiler should be free to cheat a little for 
optimization; if it can see the var never escapes, it might even 
still pop it on the stack (perhaps the function receives a 
pointer to the memory and if null, it is responsible for allocing 
it). This may not be implemented but the spec should at least be 
written to allow it later. This kind of optimization can be a 
real winner with scope too.
Jul 22 2020
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/22/2020 6:05 PM, Adam D. Ruppe wrote:
 2) the compiler should be free to cheat a little for optimization; if it can
see 
 the var never escapes, it might even still pop it on the stack (perhaps the 
 function receives a pointer to the memory and if null, it is responsible for 
 allocing it). This may not be implemented but the spec should at least be 
 written to allow it later. This kind of optimization can be a real winner with 
 scope too.
The compiler already does this if the variable being new`d is `scope`.
Jul 22 2020
next sibling parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Thursday, 23 July 2020 at 05:12:30 UTC, Walter Bright wrote:
 On 7/22/2020 6:05 PM, Adam D. Ruppe wrote:
 2) the compiler should be free to cheat a little for 
 optimization; if it can see the var never escapes, it might 
 even still pop it on the stack (perhaps the function receives 
 a pointer to the memory and if null, it is responsible for 
 allocing it). This may not be implemented but the spec should 
 at least be written to allow it later. This kind of 
 optimization can be a real winner with scope too.
The compiler already does this if the variable being new`d is `scope`.
LDC has an optimization pass [1] which promotes heap allocations to stack allocations, without the user having to manually use the `scope` storage class for function-local variables. Do you think we could formalize this optimization and move it up the pipeline into the front-end, so that it's guaranteed to be performed by all 3 compilers? [1]: https://github.com/ldc-developers/ldc/blob/v1.23.0-beta1/gen/passes/GarbageCollect2Stack.cpp
Jul 22 2020
next sibling parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Thursday, 23 July 2020 at 06:13:52 UTC, Petar Kirov 
[ZombineDev] wrote:
 On Thursday, 23 July 2020 at 05:12:30 UTC, Walter Bright wrote:
 On 7/22/2020 6:05 PM, Adam D. Ruppe wrote:
 [...]
The compiler already does this if the variable being new`d is `scope`.
LDC has an optimization pass [1] which promotes heap allocations to stack allocations, without the user having to manually use the `scope` storage class for function-local variables. Do you think we could formalize this optimization and move it up the pipeline into the front-end, so that it's guaranteed to be performed by all 3 compilers? [1]: https://github.com/ldc-developers/ldc/blob/v1.23.0-beta1/gen/passes/GarbageCollect2Stack.cpp
... building upon the infrastructure in src/dmd/escape.d
Jul 22 2020
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 7/23/20 2:15 AM, Petar Kirov [ZombineDev] wrote:
 On Thursday, 23 July 2020 at 06:13:52 UTC, Petar Kirov [ZombineDev] wrote:
 On Thursday, 23 July 2020 at 05:12:30 UTC, Walter Bright wrote:
 On 7/22/2020 6:05 PM, Adam D. Ruppe wrote:
 [...]
The compiler already does this if the variable being new`d is `scope`.
LDC has an optimization pass [1] which promotes heap allocations to stack allocations, without  the user having to manually use the `scope` storage class for function-local variables. Do you think we could formalize this optimization and move it up the pipeline into the front-end, so that it's guaranteed to be performed by all 3 compilers? [1]: https://github.com/ldc-developers/ldc/blob/v1.23.0-beta1/gen/passes/Garb geCollect2Stack.cpp
... building upon the infrastructure in src/dmd/escape.d
That'd be awesome. The user just types new Whatever and the compiler decides whether to lower or simply use the stack.
Jul 23 2020
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 7/22/2020 11:13 PM, Petar Kirov [ZombineDev] wrote:
 The compiler already does this if the variable being new`d is `scope`.
LDC has an optimization pass [1] which promotes heap allocations to stack allocations, without  the user having to manually use the `scope` storage class for function-local variables. Do you think we could formalize this optimization and move it up the pipeline into the front-end, so that it's guaranteed to be performed by all 3 compilers? [1]: https://github.com/ldc-developers/ldc/blob/v1.23.0-beta1/gen/passes/Garb geCollect2Stack.cpp
It's a good idea, but I'd consider it a low priority since it is so easy to add scope. As for guaranteeing stack placement in the language spec, it's usually regarded as being in the QoI (Quality of Implementation) domain.
Jul 23 2020
parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Thursday, 23 July 2020 at 10:08:08 UTC, Walter Bright wrote:
 On 7/22/2020 11:13 PM, Petar Kirov [ZombineDev] wrote:
 The compiler already does this if the variable being new`d is 
 `scope`.
LDC has an optimization pass [1] which promotes heap allocations to stack allocations, without  the user having to manually use the `scope` storage class for function-local variables. Do you think we could formalize this optimization and move it up the pipeline into the front-end, so that it's guaranteed to be performed by all 3 compilers? [1]: https://github.com/ldc-developers/ldc/blob/v1.23.0-beta1/gen/passes/GarbageCollect2Stack.cpp
It's a good idea, but I'd consider it a low priority since it is so easy to add scope. As for guaranteeing stack placement in the language spec, it's usually regarded as being in the QoI (Quality of Implementation) domain.
I agree that it's not a high priority. Migrating more druntime hook to templates is more impactful as it unlocks various opportunities, whereas an optimization like this has a relatively fixed improvement potential.
Jul 23 2020
parent reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Thursday, 23 July 2020 at 10:19:41 UTC, Petar Kirov 
[ZombineDev] wrote:
 It's a good idea, but I'd consider it a low priority since it 
 is so easy to add scope.

 As for guaranteeing stack placement in the language spec, it's 
 usually regarded as being in the QoI (Quality of 
 Implementation) domain.
I agree that it's not a high priority. Migrating more druntime hook to templates is more impactful as it unlocks various opportunities, whereas an optimization like this has a relatively fixed improvement potential.
I actually see 3 (instead of 2) kinds of allocations here: 1. GC: destruction during `GC.collect()` 2. C/C++-style heap: (too large to fit on stack) scoped destruction (could be inferred?) 3. Stack: scoped destruction (inferred by LDC) The 2. is not being discussed here. Is this case obvious or irrelevant? If not which part of the runtime should handle such allocations? The GC?
Jul 23 2020
parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Thursday, 23 July 2020 at 12:16:55 UTC, Per Nordlöw wrote:
 I actually see 3 (instead of 2) kinds of allocations here:

 1. GC: destruction during `GC.collect()`
 2. C/C++-style heap: (too large to fit on stack) scoped 
 destruction (could be inferred?)
The passing of the unittest in the following module verifies that `scope`-qualified class variables are destructed when they go out of scope even when the GC is disabled. Nice. If the scope qualfifier is removed from `x` then the unittest fails as expected. /** Test how destruction of scoped classes is handled. * * See_Also: */ module scoped_class_dtor; bool g_dtor_called = false; class C { safe nothrow nogc: this(int x) { this.x = x; } ~this() { g_dtor_called = true; } int x; } void scopedC() safe nothrow { scope x = new C(42); } unittest { import core.memory : GC; GC.disable(); scopedC(); assert(g_dtor_called); GC.enable(); } Does this mean that the allocation of `scope`d classes is done on a thread-local heap separate from the GC-heap? I would be nice to get a reference to the places in dmd and/or druntime were this logic is defined.
Jul 23 2020
prev sibling next sibling parent Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Thursday, 23 July 2020 at 05:12:30 UTC, Walter Bright wrote:
 The compiler already does this if the variable being new`d is 
 `scope`.
Interesting. BTW: 1. could such `scope`d new's be allocated on a thread-local heap (instead of the global GC heap) thereby eliding the costly spinlock in the current GC implementation, given that we provide a separate lock-less interface in the GC for scoped allocations? That would make new's even faster for small sizes. 2. how could this `scope`-qualification of `new`s be inferred by the compiler using escape analysis of the `new`-ed variable? Do you any plans on implementing this for the simple cases, Walter? If it's costly could be activated in release mode?
Jul 23 2020
prev sibling parent Adam D. Ruppe <destructionator gmail.com> writes:
On Thursday, 23 July 2020 at 05:12:30 UTC, Walter Bright wrote:
 The compiler already does this if the variable being new`d is 
 `scope`.
Yeah, I know, I just want to make sure that this doesn't change and that the spec is written to allow it. Right now, the spec says: "NewExpressions are used to allocate memory on the garbage collected heap (default) or using a class or struct specific allocator. [...] If a NewExpression is used as an initializer for a function local variable with scope storage class, and the ArgumentList to new is empty, then the instance is allocated on the stack rather than the heap or using the class specific allocator. " So it arguably is a violation of the spec to use stack optimization without the scope keyword, and vice versa. It should really just specify it allocates and initializes the class while leaving where it does it as implementation-defined. When lowering to a template, it should also be careful to say it will not necessarily lower to the same thing; a user should NOT expect the template to actually even be called and the template is free to change its strategy. So the implementation right now is ok and I expect the lowering to a template will be equally OK. Just be careful not to specify too much.
Jul 23 2020
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
On Thu, Jul 23, 2020 at 10:50 AM Andrei Alexandrescu via Digitalmars-d <
digitalmars-d puremagic.com> wrote:

 Was thinking about this, see
 https://issues.dlang.org/show_bug.cgi?id=21065.

 One problem I noticed with the current instrumentation of allocations is
 that it is extremely slow. https://github.com/dlang/dmd/pull/11381 takes
 care of that trivially and quickly because it takes advantage of
 defining one static variable per instantiation.
I've had this thought too. Great idea!
Jul 22 2020
prev sibling next sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 23 July 2020 at 00:47:21 UTC, Andrei Alexandrescu 
wrote:
 Was thinking about this, see 
 https://issues.dlang.org/show_bug.cgi?id=21065.

 One problem I noticed with the current instrumentation of 
 allocations is that it is extremely slow. 
 https://github.com/dlang/dmd/pull/11381 takes care of that 
 trivially and quickly because it takes advantage of defining 
 one static variable per instantiation.
How does this fit in with plans for std.experimental.allocator? What are the current plans for std.experimental.allocator?
Jul 23 2020
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 7/23/20 9:34 AM, jmh530 wrote:
 On Thursday, 23 July 2020 at 00:47:21 UTC, Andrei Alexandrescu wrote:
 Was thinking about this, see 
 https://issues.dlang.org/show_bug.cgi?id=21065.

 One problem I noticed with the current instrumentation of allocations 
 is that it is extremely slow. https://github.com/dlang/dmd/pull/11381 
 takes care of that trivially and quickly because it takes advantage of 
 defining one static variable per instantiation.
How does this fit in with plans for std.experimental.allocator? What are the current plans for std.experimental.allocator?
Allocators need a champion. Ideally a good integration with the GC would be achieved but I'm not a GC expert and don't have the time to dedicate to it. As far as I know there's little use of allocators, which is unlike C++ where there's a lot of excitement around them in spite of a much scarcer API. I recall there's a little use of allocators (copied to code.dlang.org and improved) in Mir. Not much else I've heard of. I was hoping there'd be a lot of experimentation accumulating with new allocators by now, for example to this day I have no idea whether FreeTree is any good. (It never rebalances, but I thought I'd wait until someone says the trees get lopsided... still waiting).
Jul 23 2020
next sibling parent reply Ogi <ogion.art gmail.com> writes:
On Friday, 24 July 2020 at 03:29:14 UTC, Andrei Alexandrescu 
wrote:
 As far as I know there's little use of allocators, which is 
 unlike C++ where there's a lot of excitement around them in 
 spite of a much scarcer API. I recall there's a little use of 
 allocators (copied to code.dlang.org and improved) in Mir. Not 
 much else I've heard of.
There’s emsi-containers library [1]. Also, allocators are used in vibe.d in the form of dub package [2]. [1] https://github.com/dlang-community/containers [2] https://github.com/vibe-d/vibe.d/pull/1983
Jul 24 2020
parent Andre Pany <andre s-e-a-p.de> writes:
On Friday, 24 July 2020 at 08:24:17 UTC, Ogi wrote:
 On Friday, 24 July 2020 at 03:29:14 UTC, Andrei Alexandrescu 
 wrote:
 As far as I know there's little use of allocators, which is 
 unlike C++ where there's a lot of excitement around them in 
 spite of a much scarcer API. I recall there's a little use of 
 allocators (copied to code.dlang.org and improved) in Mir. Not 
 much else I've heard of.
There’s emsi-containers library [1]. Also, allocators are used in vibe.d in the form of dub package [2]. [1] https://github.com/dlang-community/containers [2] https://github.com/vibe-d/vibe.d/pull/1983
Also libdparse and DScanner is using stdx-allocator. Dub-registry is showing only dependencies but unfortunately not "where used". Kind regards Andre
Jul 24 2020
prev sibling next sibling parent Atila Neves <atila.neves gmail.com> writes:
On Friday, 24 July 2020 at 03:29:14 UTC, Andrei Alexandrescu 
wrote:
 On 7/23/20 9:34 AM, jmh530 wrote:
 On Thursday, 23 July 2020 at 00:47:21 UTC, Andrei Alexandrescu 
 wrote:
 Was thinking about this, see 
 https://issues.dlang.org/show_bug.cgi?id=21065.

 One problem I noticed with the current instrumentation of 
 allocations is that it is extremely slow. 
 https://github.com/dlang/dmd/pull/11381 takes care of that 
 trivially and quickly because it takes advantage of defining 
 one static variable per instantiation.
How does this fit in with plans for std.experimental.allocator? What are the current plans for std.experimental.allocator?
Allocators need a champion. Ideally a good integration with the GC would be achieved but I'm not a GC expert and don't have the time to dedicate to it. As far as I know there's little use of allocators, which is unlike C++ where there's a lot of excitement around them in spite of a much scarcer API. I recall there's a little use of allocators (copied to code.dlang.org and improved) in Mir. Not much else I've heard of. I was hoping there'd be a lot of experimentation accumulating with new allocators by now, for example to this day I have no idea whether FreeTree is any good. (It never rebalances, but I thought I'd wait until someone says the trees get lopsided... still waiting).
The main issue right now with allocators IMHO are language changes that would allow for the writing of safe smart pointers and the like. Currently, opting out of the GC means putting up with C++-style memory safety bugs. Thankfully ldc gives us asan, but while that helps it's definitely not a silver bullet. I want to come up with a DIP but have been busy trying to fix template code emission issues , which also tie into build times.
Jul 24 2020
prev sibling next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 7/23/20 11:29 PM, Andrei Alexandrescu wrote:
 On 7/23/20 9:34 AM, jmh530 wrote:
 On Thursday, 23 July 2020 at 00:47:21 UTC, Andrei Alexandrescu wrote:
 Was thinking about this, see 
 https://issues.dlang.org/show_bug.cgi?id=21065.

 One problem I noticed with the current instrumentation of allocations 
 is that it is extremely slow. https://github.com/dlang/dmd/pull/11381 
 takes care of that trivially and quickly because it takes advantage 
 of defining one static variable per instantiation.
How does this fit in with plans for std.experimental.allocator? What are the current plans for std.experimental.allocator?
Allocators need a champion. Ideally a good integration with the GC would be achieved but I'm not a GC expert and don't have the time to dedicate to it. As far as I know there's little use of allocators, which is unlike C++ where there's a lot of excitement around them in spite of a much scarcer API. I recall there's a little use of allocators (copied to code.dlang.org and improved) in Mir. Not much else I've heard of. I was hoping there'd be a lot of experimentation accumulating with new allocators by now, for example to this day I have no idea whether FreeTree is any good. (It never rebalances, but I thought I'd wait until someone says the trees get lopsided... still waiting).
iopipe uses allocators, though I had to write my own default GC allocator, because I needed an allocator that doesn't provide blocks with SCAN set. At least in one place I used it to avoid using the heap (for a basic output pipe): https://github.com/schveiguy/iopipe/blob/0974b19e389e0c35779fa2d5b4690f775264239d/source/iopipe/bufpipe.d#L618-L633 And in action here: https://github.com/schveiguy/httpiopipe/blob/a04d87de3aa3836c07d181263c399416ba005e7c/source/iopipe/http.d#L761 I think something that might help gain more traction is to provide a mechanism to ask for typed data. Some allocators need to do things when they have types. For example, the GC can run destructors on structs when allocated properly. If allocators are not going to provide complete replacement for GC calls, then one is going to have to use conditional compilation to make it work. -Steve
Jul 24 2020
prev sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Friday, 24 July 2020 at 03:29:14 UTC, Andrei Alexandrescu 
wrote:
 [snip]

 Allocators need a champion. Ideally a good integration with the 
 GC would be achieved but I'm not a GC expert and don't have the 
 time to dedicate to it.

 As far as I know there's little use of allocators, which is 
 unlike C++ where there's a lot of excitement around them in 
 spite of a much scarcer API. I recall there's a little use of 
 allocators (copied to code.dlang.org and improved) in Mir. Not 
 much else I've heard of.

 I was hoping there'd be a lot of experimentation accumulating 
 with new allocators by now, for example to this day I have no 
 idea whether FreeTree is any good. (It never rebalances, but I 
 thought I'd wait until someone says the trees get lopsided... 
 still waiting).
I know mir has makeSlice and makeNdSlice that use std.experimental.allocator. I can't speak to why people don't use them in D, but it is hard to compare to C++ where it is already part of the standard library.
Jul 24 2020
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 7/24/20 10:08 AM, jmh530 wrote:
 I can't speak to why people don't use them in D, but it is hard to 
 compare to C++ where it is already part of the standard library.
There's a lot more development beyond that (in which I may take a part, too): http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p2035r0.pdf It seems there's a strong recognition within a part of the C++ community that memory allocation is an essential part of fast scalable application. Consequently, there's strong incentive to push for better allocator integration within the language.
Jul 24 2020
prev sibling next sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Thursday, 23 July 2020 at 00:47:21 UTC, Andrei Alexandrescu 
wrote:
 Was thinking about this, see 
 https://issues.dlang.org/show_bug.cgi?id=21065.

 One problem I noticed with the current instrumentation of 
 allocations is that it is extremely slow. 
 https://github.com/dlang/dmd/pull/11381 takes care of that 
 trivially and quickly because it takes advantage of defining 
 one static variable per instantiation.
As Mathias has already point out in your issue. You are instantiating more templates. In the _worst_ case this can almost double the number of template instances. I.E. when the new is inside a template itself. What do you envision the prototype to be like, and why couldn't that be just a function call to the runtime if a hook exists?
Jul 23 2020
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Thursday, 23 July 2020 at 18:51:33 UTC, Stefan Koch wrote:
 In the _worst_ case this can almost double the number of 
 template instances.
 I.E. when the new is inside a template itself.
There should be only one new instance per type, and it should be trivially inlined....
Jul 23 2020
parent Stefan Koch <uplink.coder googlemail.com> writes:
On Thursday, 23 July 2020 at 19:57:25 UTC, Adam D. Ruppe wrote:
 On Thursday, 23 July 2020 at 18:51:33 UTC, Stefan Koch wrote:
 In the _worst_ case this can almost double the number of 
 template instances.
 I.E. when the new is inside a template itself.
There should be only one new instance per type, and it should be trivially inlined....
It's anyone's guess how many types you actually have. Template code tends to create a huge number of em.
Jul 23 2020
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 7/23/20 2:51 PM, Stefan Koch wrote:
 On Thursday, 23 July 2020 at 00:47:21 UTC, Andrei Alexandrescu wrote:
 Was thinking about this, see 
 https://issues.dlang.org/show_bug.cgi?id=21065.

 One problem I noticed with the current instrumentation of allocations 
 is that it is extremely slow. https://github.com/dlang/dmd/pull/11381 
 takes care of that trivially and quickly because it takes advantage of 
 defining one static variable per instantiation.
As Mathias has already point out in your issue. You are instantiating more templates. In the _worst_ case this can almost double the number of template instances. I.E. when the new is inside a template itself.
Not a problem. We must go with templates all the way, it's been that way since the STL was created and there's no going back. All transformations of nonsense C-style crap into templates in object.d have been as many success stories. If anything there's too few templates in there. When "new C" is issued everything is right there for the grabs - everything! The instance size, the alignment, the constructor to call. Yet what do we do? We gladly throw all that on the floor to call a crummy C function that uses indirect access to get access to those. No wonder -trace=gc is so slow. This entire inefficient pomp and circumstance around creating an object is an embarrassment. To say nothing about built-in hash tables. It's 2020 and we still use an indirect call for each comparison, right? We should make a vow to fix that until the next Pandemic.
 What do you envision the prototype to be like, and why couldn't that be 
 just a function call to the runtime if a hook exists?
In the simplest form: template __new_instance(C) if (is(C == class)) { static foreach (all overloads of __ctor in C) { C __new_instance(__appropriate__parameter__set) { ... } } } This would be a terrific opportunity to fix the perfect forwarding problem. Further improvements: use introspection to detect if the class defines its own __new_instance, and if it does, defer to it. That way the deprecated feature per-class new makes a comeback the right way.
Jul 23 2020
next sibling parent Sebastiaan Koppe <mail skoppe.eu> writes:
On Friday, 24 July 2020 at 03:23:07 UTC, Andrei Alexandrescu 
wrote:
 On 7/23/20 2:51 PM, Stefan Koch wrote:
 In the _worst_ case this can almost double the number of 
 template instances.
 I.E. when the new is inside a template itself.
Not a problem. We must go with templates all the way, it's been that way since the STL was created and there's no going back. All transformations of nonsense C-style crap into templates in object.d have been as many success stories. If anything there's too few templates in there. When "new C" is issued everything is right there for the grabs - everything! The instance size, the alignment, the constructor to call. Yet what do we do? We gladly throw all that on the floor to call a crummy C function that uses indirect access to get access to those. No wonder -trace=gc is so slow. This entire inefficient pomp and circumstance around creating an object is an embarrassment.
Yes, I have wished many times that the call to druntime for newing an object was templated. For one that makes it easier to do without TypeInfo, but in general it gives more control.
Jul 24 2020
prev sibling next sibling parent Stefan Koch <uplink.coder googlemail.com> writes:
On Friday, 24 July 2020 at 03:23:07 UTC, Andrei Alexandrescu 
wrote:
 On 7/23/20 2:51 PM, Stefan Koch wrote:
 On Thursday, 23 July 2020 at 00:47:21 UTC, Andrei Alexandrescu 
 wrote:
 Was thinking about this, see 
 https://issues.dlang.org/show_bug.cgi?id=21065.

 One problem I noticed with the current instrumentation of 
 allocations is that it is extremely slow. 
 https://github.com/dlang/dmd/pull/11381 takes care of that 
 trivially and quickly because it takes advantage of defining 
 one static variable per instantiation.
As Mathias has already point out in your issue. You are instantiating more templates. In the _worst_ case this can almost double the number of template instances. I.E. when the new is inside a template itself.
Not a problem. We must go with templates all the way, it's been that way since the STL was created and there's no going back. All transformations of nonsense C-style crap into templates in object.d have been as many success stories. If anything there's too few templates in there.
You say that, but we have seen problems, for example with the mangle-length of huge stringswitch instances. Shortly after the introduction array comparison was bugged because, it could not compare char[] and const char[]. They were different types and would compare to false, therefore. Where are the many success stories? I haven't read any article about how the templates in druntime have helped anyone in great capacity, (to be honest I have not looked either).
 When "new C" is issued everything is right there for the grabs 
 - everything! The instance size, the alignment, the constructor 
 to call. Yet what do we do? We gladly throw all that on the 
 floor to call a crummy C function that uses indirect access to 
 get access to those. No wonder -trace=gc is so slow. This 
 entire inefficient pomp and circumstance around creating an 
 object is an embarrassment.
gc tracing is slow because it's not a priority to make it fast. Every compiler which does inline-ing properly, can get around the indirect calls.
 To say nothing about built-in hash tables. It's 2020 and we 
 still use an indirect call for each comparison, right? We 
 should make a vow to fix that until the next Pandemic.
The builtin-hash tables, are not cool, true. But that's not because of cmps.
 What do you envision the prototype to be like, and why 
 couldn't that be just a function call to the runtime if a hook 
 exists?
In the simplest form: template __new_instance(C) if (is(C == class)) { static foreach (all overloads of __ctor in C) { C __new_instance(__appropriate__parameter__set) { ... } } } This would be a terrific opportunity to fix the perfect forwarding problem.
You are aware that this will create a lot of work for semantic, yes? Whereas just emitting a call, would be O(1), fast and most of all, predictable. Making this a template invites the error messages or templates as well. What if the constraint fails, you get "Could not find a matching overload for '__new_instance'". Whereas a solution which uses a function call, could give a more informative error message.
 Further improvements: use introspection to detect if the class 
 defines its own __new_instance, and if it does, defer to it. 
 That way the deprecated feature per-class new makes a comeback 
 the right way.
I wonder why the feature was deprecated in the first place. Actually it's not very costly to simply allow two ways of customizing 'opNew'. If that were the case one could even do direct comparisons of the template vs. the function approach. D has always prided itself on giving the decision to the user, I don't see any compelling reason not to do it here. Regards, Stefan
Jul 24 2020
prev sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Friday, 24 July 2020 at 03:23:07 UTC, Andrei Alexandrescu 
wrote:
 On 7/23/20 2:51 PM, Stefan Koch wrote:
 On Thursday, 23 July 2020 at 00:47:21 UTC, Andrei Alexandrescu 
 wrote:
 Was thinking about this, see 
 https://issues.dlang.org/show_bug.cgi?id=21065.

 One problem I noticed with the current instrumentation of 
 allocations is that it is extremely slow. 
 https://github.com/dlang/dmd/pull/11381 takes care of that 
 trivially and quickly because it takes advantage of defining 
 one static variable per instantiation.
As Mathias has already point out in your issue. You are instantiating more templates. In the _worst_ case this can almost double the number of template instances. I.E. when the new is inside a template itself.
Not a problem. We must go with templates all the way, it's been that way since the STL was created and there's no going back. All transformations of nonsense C-style crap into templates in object.d have been as many success stories. If anything there's too few templates in there.
You say that, but we have seen problems, for example with the mangle-length of huge stringswitch instances. Shortly after the introduction array comparison was bugged because, it could not compare char[] and const char[]. They were different types and would compare to false, therefore. Where are the many success stories? I haven't read any article about how the templates in druntime have helped anyone in great capacity, (to be honest I have not looked either).
 When "new C" is issued everything is right there for the grabs 
 - everything! The instance size, the alignment, the constructor 
 to call. Yet what do we do? We gladly throw all that on the 
 floor to call a crummy C function that uses indirect access to 
 get access to those. No wonder -trace=gc is so slow. This 
 entire inefficient pomp and circumstance around creating an 
 object is an embarrassment.
gc tracing is slow because it's not a priority to make it fast. Every compiler which does inline-ing properly, can get around the indirect calls.
 To say nothing about built-in hash tables. It's 2020 and we 
 still use an indirect call for each comparison, right? We 
 should make a vow to fix that until the next Pandemic.
The builtin-hash tables, are not cool, true. But that's not because of cmps.
 What do you envision the prototype to be like, and why 
 couldn't that be just a function call to the runtime if a hook 
 exists?
In the simplest form: template __new_instance(C) if (is(C == class)) { static foreach (all overloads of __ctor in C) { C __new_instance(__appropriate__parameter__set) { ... } } } This would be a terrific opportunity to fix the perfect forwarding problem.
You are aware that this will create a lot of work for semantic, yes? Whereas just emitting a call, would be O(1), fast and most of all, predictable. Making this a template invites the error messages or templates as well. What if the constraint fails, you get "Could not find a matching overload for '__new_instance'". Whereas a solution which uses a function call, could give a more informative error message.
 Further improvements: use introspection to detect if the class 
 defines its own __new_instance, and if it does, defer to it. 
 That way the deprecated feature per-class new makes a comeback 
 the right way.
I wonder why the feature was deprecated in the first place. Actually it's not very costly to simply allow two ways of customizing 'opNew'. If that were the case one could even do direct comparisons of the template vs. the function approach. D has always prided itself on giving the decision to the user, I don't see any compelling reason not to do it here. Regards, Stefan
Jul 24 2020
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 7/24/20 4:58 AM, Stefan Koch wrote:
 On Friday, 24 July 2020 at 03:23:07 UTC, Andrei Alexandrescu wrote:
 Not a problem. We must go with templates all the way, it's been that 
 way since the STL was created and there's no going back. All 
 transformations of nonsense C-style crap into templates in object.d 
 have been as many success stories. If anything there's too few 
 templates in there.
You say that, but we have seen problems, for example with the mangle-length of huge stringswitch instances.
I looked at that. It's a non-issue. Yes, there's a long mangled name for a switch statement with strings. It scales up with the number of switch statements (with strings) and the number of strings. It's called once. And that's about it. It's easy to improve (e.g. by hashing) but there's no need to.
 Shortly after the introduction array comparison was bugged because, it 
 could not compare char[] and const char[]. They were different types and 
 would compare to false, therefore.
So there was a bug in the implementation. I recall there was (and there may still be) a problem with things like array comparison because the compiler did it in different ad-hoc ways depending on compile- vs. run-time execution. This would be something very worth looking into.
 Where are the many success stories?
Lucia demonstrated dramatic improvements on array comparison speed: https://www.youtube.com/watch?v=P6ZST80BCIg.
 I haven't read any article about how the templates in druntime have 
 helped anyone in great capacity, (to be honest I have not looked either).
Problem is there are not enough. We need to template all we can of druntime. All those silly C-style functions decaying to void* and dependencies and runtime type information must go.
Jul 24 2020
next sibling parent Seb <seb wilzba.ch> writes:
On Friday, 24 July 2020 at 14:06:28 UTC, Andrei Alexandrescu 
wrote:
 Problem is there are not enough.
+2
 We need to template all we can of druntime. All those silly 
 C-style functions decaying to void* and dependencies and 
 runtime type information must go.
<3 One very important upsides that hasn't even been mentioned is that the compiler (+ the C-style functions) are plainly lying to us as they aren't properly type-checked. The compiler assumes that they are e.g. ` nogc safe pure nothrow`, but this is not always true. This alone should be motivation enough. Further details regarding the breakout of the type system can be found in Dan's GSoC19 project: https://github.com/dlang/projects/issues/25
Jul 24 2020
prev sibling next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 7/24/20 10:06 AM, Andrei Alexandrescu wrote:
 We need to template all we can of druntime. All those silly C-style 
 functions decaying to void* and dependencies and runtime type 
 information must go.
Yes please! Template implementations make bugs in the runtime so much easier to fix, and make features easier to add. Not to mention all the inference we lose when we don't do it this way. Consider all the features added to AAs via templates. Even though they have to be thin wrappers around the C-style functions, it allows much better control over which calls are valid. -Steve
Jul 24 2020
prev sibling parent reply Johan <j j.nl> writes:
On Friday, 24 July 2020 at 14:06:28 UTC, Andrei Alexandrescu 
wrote:
 We need to template all we can of druntime. All those silly 
 C-style functions decaying to void* and dependencies and 
 runtime type information must go.
Reliance on runtime type information for object creation is not good, I agree. But that is easily solved by having the compiler make a call to the ctor (instead of implementing that call in druntime). Another option is to templatize, as you wrote. I don't agree that everything should be moved to templates in druntime, but I cannot formulate a strong argument against it (I also have not seen strong arguments in favor of it though). Because "careful consideration" is not a strong point of D development, I am wary. Templatizing will make certain types of functionality _harder_. If a library/program wants to break, or hook into, object creation inside user code, it will be impossible with templatized functions (you cannot predict the templated function names of code that has not yet been written). For example, AddressSanitizer can only work with non-templatized allocation functions. If malloc was a template, it would be _much_ harder to implement AddressSanitizer support (and currently not possible while keeping source location in error messages). In favor of compiler-implemented code (instead of druntime) is that the compiler is able to add debug information to generated code, linking the instructions to source location in user code. I think this will only be possible for druntime-implemented when: a) the code is force-inlined and b) there is a way to disable debug information tagging of the instructions inside the druntime-implemented function (such that the source-location stays the same as the user's code). (a) is already available in our compilers, (b) is not yet. Template bloat is a standard concern, because we do not have any defense against it. Did anyone ever look into the array comparison case with template bloat? Force-inlining _and_ not emitting into binary (currently not available to user/druntime code) would fix that, although then maybe someone else starts to argue about instruction count increase... -Johan
Jul 24 2020
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Jul 24, 2020 at 05:14:47PM +0000, Johan via Digitalmars-d wrote:
 On Friday, 24 July 2020 at 14:06:28 UTC, Andrei Alexandrescu wrote:
 We need to template all we can of druntime. All those silly C-style
 functions decaying to void* and dependencies and runtime type
 information must go.
[...]
 I don't agree that everything should be moved to templates in
 druntime, but I cannot formulate a strong argument against it (I also
 have not seen strong arguments in favor of it though). Because
 "careful consideration" is not a strong point of D development, I am
 wary.
I think, as with all things in language development, there's a trade-off. Templates are not a silver bullet. For example, the string-switch fiasco that somebody referred to. Ultimately, IIRC, that was worked around by rewriting std.datetime to refactor away that giant 1000+ (or was it 5000+) case switch statement -- but that did not fix the real problem, which is that when you have a very large number of template arguments, it just introduces a lot of problems, both in the compiler (performance, size of generated symbols, etc) and in the binary (ridiculous size of symbols, template bloat). Template bloat is also a cause of concern: if your program had arrays of 100 distinct types, but they are all PODs and therefore comparison could be done with memcmp()'s, then it makes little sense for the compiler to instantiate the array comparison template 100 times and emit 100 copies of code that ultimately is semantically identical to memcmp. Ditto for any other array operation that you have have. Force-inlining only helps to a certain extent -- as Johan said, you merely end up with instruction count bloat in lieu of template function bloat. At some point, the cost of having an overly-large executable will start to outweigh the (small!) performance hit of replacing instruction count bloat with a function call to memcmp, for example. Another problem is error messages, as someone else has already pointed out. Although we have seen improvements in template-related error messages, they are still very user-unfriendly, often coming in the form of indirect messages like "this call does not match any of the following 25 overloads", and the user has to sift through pages upon pages of indecipherably-long template symbols and somehow deduce that what the compiler *really* want to say was "you have a typo on line 123". I think part of the solution is to adopt Stefan's proposed type functions instead of shoe-horning everything into templates. Just because templates are Turing-complete does not necessarily mean they are practical for implementing every computation. A lot of the druntime templates / prospective druntime templates could possibly be implemented as type functions rather than template functions. T -- Error: Keyboard not attached. Press F1 to continue. -- Yoon Ha Lee, CONLANG
Jul 24 2020
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.com> writes:
On 7/24/20 2:07 PM, H. S. Teoh wrote:
 For example, the
 string-switch fiasco that somebody referred to.
What fiasco?
Jul 24 2020
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.com> writes:
On 7/24/20 2:07 PM, H. S. Teoh wrote:
 Template bloat is also a cause of concern: if your program had arrays of
 100 distinct types, but they are all PODs and therefore comparison could
 be done with memcmp()'s, then it makes little sense for the compiler to
 instantiate the array comparison template 100 times and emit 100 copies
 of code that ultimately is semantically identical to memcmp.
We do exactly what you describe would be desirable. All comparisons of arrays of scalars boil down to a small code: https://github.com/dlang/druntime/blob/master/src/core/internal/array/comparison.d#L50 See generated code: https://godbolt.org/z/4hjWxq (dmd does not inline that, hopefully it will in the next release. Does not dilute my argument.)
 Ditto for
 any other array operation that you have have.  Force-inlining only helps
 to a certain extent -- as Johan said, you merely end up with instruction
 count bloat in lieu of template function bloat. At some point, the cost
 of having an overly-large executable will start to outweigh the (small!)
 performance hit of replacing instruction count bloat with a function
 call to memcmp, for example.
As I said, with templates you have precise control over merging implementations, inlining vs. not, certain optimizations etc. Without templates - you don't. It's as simple as that. To paraphrase Steven Wright, going non-templates is a one-way dead-end street.
 Another problem is error messages, as someone else has already pointed
 out.  Although we have seen improvements in template-related error
 messages, they are still very user-unfriendly, often coming in the form
 of indirect messages like "this call does not match any of the following
 25 overloads", and the user has to sift through pages upon pages of
 indecipherably-long template symbols and somehow deduce that what the
 compiler *really* want to say was "you have a typo on line 123".
There is great opportunity to improve compilers, some of which has been reaped, and much of which is still waiting.
 I think part of the solution is to adopt Stefan's proposed type
 functions instead of shoe-horning everything into templates. Just
 because templates are Turing-complete does not necessarily mean they are
 practical for implementing every computation. A lot of the druntime
 templates / prospective druntime templates could possibly be implemented
 as type functions rather than template functions.
That'd be interesting. In the meantime, there's so much to do with the current template offering it hurts.
Jul 24 2020
parent Stefan Koch <uplink.coder googlemail.com> writes:
On Friday, 24 July 2020 at 18:45:39 UTC, Andrei Alexandrescu 
wrote:
 On 7/24/20 2:07 PM, H. S. Teoh wrote:

 I think part of the solution is to adopt Stefan's proposed type
 functions instead of shoe-horning everything into templates. 
 Just
 because templates are Turing-complete does not necessarily 
 mean they are
 practical for implementing every computation. A lot of the 
 druntime
 templates / prospective druntime templates could possibly be 
 implemented
 as type functions rather than template functions.
That'd be interesting. In the meantime, there's so much to do with the current template offering it hurts.
You (and everyone else) are very welcome to help with the type functions.
Jul 25 2020
prev sibling parent Johan <j j.nl> writes:
On Friday, 24 July 2020 at 18:07:21 UTC, H. S. Teoh wrote:
 Another problem is error messages, as someone else has already 
 pointed out.  Although we have seen improvements in 
 template-related error messages, they are still very 
 user-unfriendly, often coming in the form of indirect messages 
 like "this call does not match any of the following 25 
 overloads", and the user has to sift through pages upon pages 
 of indecipherably-long template symbols and somehow deduce that 
 what the compiler *really* want to say was "you have a typo on 
 line 123".
This we can easily test. What would it look like if there is an error in a user class constructor when it is called by a druntime function versus currently? -Johan
Jul 24 2020
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.com> writes:
On 7/24/20 1:14 PM, Johan wrote:
 On Friday, 24 July 2020 at 14:06:28 UTC, Andrei Alexandrescu wrote:
 We need to template all we can of druntime. All those silly C-style 
 functions decaying to void* and dependencies and runtime type 
 information must go.
Reliance on runtime type information for object creation is not good, I agree. But that is easily solved by having the compiler make a call to the ctor (instead of implementing that call in druntime). Another option is to templatize, as you wrote.
Thanks for answering! I was deliberately exaggerating for dramatic purposes, so it's great to see such an even-keeled analytical response.
 I don't agree that everything should be moved to templates in druntime, 
 but I cannot formulate a strong argument against it (I also have not 
 seen strong arguments in favor of it though). Because "careful 
 consideration" is not a strong point of D development, I am wary.
 
 Templatizing will make certain types of functionality _harder_. If a 
 library/program wants to break, or hook into, object creation inside 
 user code, it will be impossible with templatized functions (you cannot 
 predict the templated function names of code that has not yet been 
 written). For example, AddressSanitizer can only work with 
 non-templatized allocation functions. If malloc was a template, it would 
 be _much_ harder to implement AddressSanitizer support (and currently 
 not possible while keeping source location in error messages).
I think there's a simple pro-template argument that I've run into often in my C++ projects: template to non-template is a one-way street because from static type information to non-static type information is a one-way street. You can go from a templated approach to a non-templated approach as easy as a one-liner. You can go from static composition to dynamic. You can do type erasure but not type "reconstruction". Going the other way is much more difficult for obvious reasons. That's why deferring the decision to lose type information is often a good stance for a designer; it offers maximum flexibility because you can easily revisit and override it. (This argument pattern goes for other things, too: making a fast subsystem safe vs. the converse comes to mind.) Applied to druntime, if using runtime functions is desirable, the templates can just forward to them. So once we decide to use templates for fundamental runtime support, changing our mind selectively is trivial. Going the opposite way (which is what we're doing now) is a difficult path with compiler changes and all.
 In favor of compiler-implemented code (instead of druntime) is that the 
 compiler is able to add debug information to generated code, linking the 
 instructions to source location in user code. I think this will only be 
 possible for druntime-implemented when: a) the code is force-inlined and 
 b) there is a way to disable debug information tagging of the 
 instructions inside the druntime-implemented function (such that the 
 source-location stays the same as the user's code). (a) is already 
 available in our compilers, (b) is not yet.
 
 Template bloat is a standard concern, because we do not have any defense 
 against it. Did anyone ever look into the array comparison case with 
 template bloat? Force-inlining _and_ not emitting into binary (currently 
 not available to user/druntime code) would fix that, although then maybe 
 someone else starts to argue about instruction count increase...
I'm tuned to some of the C++ standardization mailing lists. Template bloat has come into discussion about every template-related feature addition since 1990. Occasionally with numbers. Over time these arguments died out: the ease of opting out of templates where appropriate and the many tactical improvements made by compiler writers have simply sent that argument to the dustbin of history. All C++ features of any consequence since decades are imbued with genericity. It has been undeniable that going templated all the way has worked wonders for C++. We'd do good to learn from that.
Jul 24 2020
prev sibling parent Tim <tim.dlang t-online.de> writes:
On Thursday, 23 July 2020 at 00:47:21 UTC, Andrei Alexandrescu 
wrote:
 One problem I noticed with the current instrumentation of 
 allocations is that it is extremely slow.
Programs compiled with -profile=gc are much faster with this change: https://github.com/dlang/druntime/pull/3164 The problem was, that GC.stats was called for every allocation, but GC.stats does more than necessary for -profile=gc. Unfortunately this change breaks the ABI of druntime.
Jul 24 2020