digitalmars.D - RFC: generic safety, specialized implementation?

=?UTF-8?B?THXDrXM=?= Marques (103/103) Jan 19 2018 Consider some C code like this:

Adam D. Ruppe (9/12) Jan 19 2018 This is actually the way a lot of stuff is implemented in d
H. S. Teoh (29/48) Jan 19 2018 [...]
Timothee Cour (9/58) Jan 19 2018 this is very related to ICF (identical code folding), which some linkers...

=?UTF-8?B?THXDrXM=?= Marques <luis luismarques.eu> writes:

Consider some C code like this:

     struct Foo
     {
         struct Foo *next;
         int bar;
     };

     struct Bar
     {
         struct Bar *next;
         char* name;
     };

     struct Unrelated
     {
         int x;
         int y;
     };

     void combulate(void *item)
     {
         struct ItemWithNext
         {
             struct ItemWithNext *next;
         };

         struct ItemWithNext *inext = item;

         // ... lots of code here
         inext->next = ...;
     }

     int main()
     {
         struct Foo foo;
         struct Bar bar;
         struct Unrelated unrelated;
         combulate(&foo);
         combulate(&bar);
         combulate(&unrelated); // bug
     }

The function `combulate` must use a `void*` parameter to accept 
any argument that structurally conforms to the interface it 
expects -- in this case, having a `next` field as its first 
member. That has the disadvantage that the type system doesn't 
catch the mistake of passing it an Unrelated value, which doesn't 
conform to the expected binary interface. In D we might instead 
do something like this:

     struct Foo
     {
         Foo* next;
         int bar;
     }

     struct Bar
     {
         Bar* next;
         char* name;
     }

     struct Unrelated
     {
         int x;
         int y;
     };

     void combulate(T)(T* item)
     {
         // ... lots of code here
         item.next = ...;
     }

     void main()
     {
         Foo foo;
         Bar bar;
         Unrelated unrelated;
         combulate(&foo);
         combulate(&bar);
         combulate(&unrelated); // compilation error
     }

This catches the bug, but will have the disadvantage of 
generating code for the various types to which combulate is 
specialized, even though the body of the function template 
doesn't rely on anything that varies between the specializations. 
That is problematic in the context where I'm using this (embedded 
systems). So instead I've started using a mixed approach, with 
generic code that checks for the appropriate type, but delegates 
the actual work to a non-templated function (a fully specialized 
function template in my actual case), except in the cases where 
the actual code is small enough or the specialization 
significantly improves the performance. Something like this:

     private struct ItemWithNext
     {
         ItemWithNext *next;
     }

     private void combulate(ItemWithNext* item)
     {
         // ... lots of code here
         item.next = ...;
     }

     pragma(inline, true)
     void combulate(T)(T* item)
         if(someKindOfCheck!item)
     {
         combulate(cast(ItemWithNext*) item);
     }

So far this seems to be working well for me. Do you have 
experience writing this kind of code? Do you have any advice that 
might be relevant to this situation?

PS: I know I don't have to define and use a structure to access 
the next field, but I feel like that generalizes better to other 
scenarios and is clearer.

Jan 19 2018

Adam D. Ruppe <destructionator gmail.com> writes:

On Friday, 19 January 2018 at 19:18:22 UTC, Luís Marques wrote:
 So far this seems to be working well for me. Do you have 
 experience writing this kind of code? Do you have any advice 
 that might be relevant to this situation?

This is actually the way a lot of stuff is implemented in d 
runtime code, like the array helper functions (the interface done 
via a bit of magic in the compiler rather than templates mostly 
tho). I find it works in some cases with a bit smaller binary, 
little runtime speed change, and it can sometimes help compile 
speeds.

Of course, you can also sometimes use interfaces and classes to 
achieve this same goal.

Jan 19 2018

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Jan 19, 2018 at 07:18:22PM +0000, Lu�s Marques via Digitalmars-d wrote:
[...]
     void combulate(T)(T* item)
     {
         // ... lots of code here
         item.next = ...;
     }

[...]
[...]
 This catches the bug, but will have the disadvantage of generating
 code for the various types to which combulate is specialized, even
 though the body of the function template doesn't rely on anything that
 varies between the specializations.

Yeah, I think this is one area where the way the compiler implements
templates could be improved.  Quite often, in a complex template
(whether it's a template function or a template aggregate like a struct
or class) only a subset of the code actually depends on the template
arguments. Nevertheless, the compiler duplicates the entirety of the
code in the generated template. This leads to a lot of template bloat.

I'm not sure how possible it is in the current compiler implementation,
but it would be nice if the compiler were a bit smarter about
instantiating templates.  If an instantiated function (could be any
subset of the code, but granularity at the function level is probably
easiest to implement) would not result in code that differs from other
instantiations in the generated IR, only emit the code if it hasn't been
emitted yet; otherwise just alias that particular instantiation to the
previous instantiation.

Perhaps an enhancement request could be filed for this.


[...]
 That is problematic in the context where I'm using this (embedded
 systems). So instead I've started using a mixed approach, with generic
 code that checks for the appropriate type, but delegates the actual
 work to a non-templated function (a fully specialized function
 template in my actual case), except in the cases where the actual code
 is small enough or the specialization significantly improves the
 performance. Something like this:

[...]
 So far this seems to be working well for me. Do you have experience
 writing this kind of code? Do you have any advice that might be
 relevant to this situation?

[...]

Yeah, basically, this is just doing what I described above by hand.
I've done similar refactorings before, to reduce template bloat. It
certainly seems to work well.  However, it would be nice if the compiler
automated such rote work for us.


T

-- 
Help a man when he is in trouble and he will remember you when he is in trouble
again.

Jan 19 2018

Timothee Cour <thelastmammoth gmail.com> writes:

this is very related to ICF (identical code folding), which some linkers do.

in summary: gold and lld have options to do that, and it can be unsafe
to do if code relies on each function having distinct address;
 gold's --icf=safe option only enables ICF for functions that can be proven not
to have their address taken, so code that relies on distinct addresses will
still work.


links:
https://stackoverflow.com/questions/38662972/does-the-linker-usually-optimize-away-duplicated-code-from-different-c-templat
https://stackoverflow.com/questions/15168924/gcc-clang-merging-functions-with-identical-instructions-comdat-folding
https://releases.llvm.org/3.9.0/tools/lld/docs/ReleaseNotes.html


On Fri, Jan 19, 2018 at 11:29 AM, H. S. Teoh via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Fri, Jan 19, 2018 at 07:18:22PM +0000, Luís Marques via Digitalmars-d
wrote:
 [...]
     void combulate(T)(T* item)
     {
         // ... lots of code here
         item.next = ...;
     }

 [...]
 [...]
 This catches the bug, but will have the disadvantage of generating
 code for the various types to which combulate is specialized, even
 though the body of the function template doesn't rely on anything that
 varies between the specializations.

 Yeah, I think this is one area where the way the compiler implements
 templates could be improved.  Quite often, in a complex template
 (whether it's a template function or a template aggregate like a struct
 or class) only a subset of the code actually depends on the template
 arguments. Nevertheless, the compiler duplicates the entirety of the
 code in the generated template. This leads to a lot of template bloat.

 I'm not sure how possible it is in the current compiler implementation,
 but it would be nice if the compiler were a bit smarter about
 instantiating templates.  If an instantiated function (could be any
 subset of the code, but granularity at the function level is probably
 easiest to implement) would not result in code that differs from other
 instantiations in the generated IR, only emit the code if it hasn't been
 emitted yet; otherwise just alias that particular instantiation to the
 previous instantiation.

 Perhaps an enhancement request could be filed for this.


 [...]
 That is problematic in the context where I'm using this (embedded
 systems). So instead I've started using a mixed approach, with generic
 code that checks for the appropriate type, but delegates the actual
 work to a non-templated function (a fully specialized function
 template in my actual case), except in the cases where the actual code
 is small enough or the specialization significantly improves the
 performance. Something like this:

 [...]
 So far this seems to be working well for me. Do you have experience
 writing this kind of code? Do you have any advice that might be
 relevant to this situation?

 [...]

 Yeah, basically, this is just doing what I described above by hand.
 I've done similar refactorings before, to reduce template bloat. It
 certainly seems to work well.  However, it would be nice if the compiler
 automated such rote work for us.


 T

 --
 Help a man when he is in trouble and he will remember you when he is in
trouble again.

Jan 19 2018

D Programming

C/C++ Programming

Other

digitalmars.D - RFC: generic safety, specialized implementation?