digitalmars.D - A modest proposal: eliminate template code bloat
- Dmitry Olshansky (45/45) Apr 08 2012 I think it's been ages since I meant to ask why nobody (as in compiler
- Marco Leise (6/61) Apr 08 2012 Thoughts? Nothing much. I thought of that a while ago, but as an externa...
- Dmitry Olshansky (11/13) Apr 08 2012 I think I this idea largely formed years ago when I was working with c++...
- Marco Leise (5/20) Apr 08 2012 I don't know compilers, but doesn't the IR level still distinguish betwe...
- H. S. Teoh (37/82) Apr 08 2012 This would be incompatible with how current (non-dmd) linkers work. But
- Marco Leise (6/8) Apr 08 2012 Executables (and object files) are made up mostly of sections, some of w...
- Dmitry Olshansky (20/66) Apr 08 2012 Easy the symbol size is in object file (obviously) but it surely not
- H. S. Teoh (13/51) Apr 08 2012 I agree!
- Dmitry Olshansky (44/61) Apr 08 2012 Right the advantage is that one doesn't have to use tricks. One can
- Andrei Alexandrescu (3/4) Apr 08 2012 You may always do.
- Dmitry Olshansky (7/18) Apr 08 2012 Mm btw there is one step from here to use it on arbitrary common slices
- Daniel Murphy (14/17) Apr 08 2012 I think you'll find that this is better done in the compiler instead of ...
- Dmitry Olshansky (20/39) Apr 09 2012 "Easy": just add a hidden pointer argument to functions that have merged...
- Somedude (5/19) Apr 08 2012 Actually, in C++ (as well as D), the added benefit would be a greatly
- Artur Skawina (22/55) Apr 09 2012 They already do.
- H. S. Teoh (6/26) Apr 09 2012 Exactly my point. I *want* to give incentive to toolchain devs to add
- Artur Skawina (6/9) Apr 08 2012 Don't forget that this needs to work:
- Dmitry Olshansky (4/13) Apr 08 2012 A reference to spec page plz.
- Artur Skawina (18/30) Apr 08 2012 A reference to *a* D spec, please...
- Andrei Alexandrescu (3/19) Apr 08 2012 Doesn't apply to C++.
- Artur Skawina (16/40) Apr 08 2012 Well, let's say it isn't specified. "Functions-are-not-objects" allows
- Dmitry Olshansky (13/42) Apr 08 2012 There is common sense and that is (in my book):
- Daniel Murphy (5/9) Apr 08 2012 Or use a nop slide before the start of the function. Since we're modify...
- H. S. Teoh (6/20) Apr 08 2012 [...]
- Marco Leise (5/26) Apr 08 2012 I would like to know that use case as well, especially since I learned c...
- Daniel Murphy (5/6) Apr 09 2012 Just because I can't think of a use case doesn't mean nobody is relying ...
- H. S. Teoh (10/20) Apr 09 2012 [...]
- Marco Leise (9/21) Apr 08 2012 Do you actually rely on that behavior? It is the same as asking this to ...
- Walter Bright (6/8) Apr 08 2012 I worked out how to do it a while ago, but there's been no time to imple...
- Dmitry Olshansky (7/18) Apr 08 2012 I'm thinking what if we kind of do it as an extension so that your
- H. S. Teoh (15/24) Apr 08 2012 [...]
I think it's been ages since I meant to ask why nobody (as in compiler vendors) does what I think is rather simple optimization. In the short term the plan is to introduce a "link-time" flavored optimization at code generation or (better) link step. For simplicity let's assume compiler does all of the following during generation of an object file. 1. Every time a function is generated (or pretty much any symbol) not only a size calculated but also a checksum* of it's data. (If we go for link-time optimization we should find a place to stick it to in the object file) 2. Compiler maintains a hash-table of symbol_size ---> list( ~ array) of pairs (references to data, checksum) of all symbols with given size. Call it a duplicate table. Every function generated and more generally global immutable data should end up there. 3. After any function was generated compiler checks an entry in the duplicate table that matches size, followed by matching checksum and only then (if required) doing a straight memcmp. If it happens that there is a match compiler just throws generated code away and _aliases_ it's symbol to that of a matched entry. (so there has to be an alias table if there isn't one already) *Yes, checksum. I think it should be real simple and easy to parallel hash function. The original checksum is no reliable but some amount balancing and profiling are obviously required when picking this function. Applicability: It's not only const-immutable bloat, it can be alleviated with inout. Yet there are plenty of places the exact same code is being generated: e.g. sealed containers of int vs uint, std.find on dchar[] vs uint[]/int[] an so on. In general, the coarse grained parametrization is the root of all evil and it is inevitable since we are just humans after all. Notes: 1. If we do checksum calculation on the fly during codegen it gets at virtually no cost as the data is in CPU data cache. Preliminary version can avoid hacking this part of backend though. 2. By _alias_ I mean the ability of compiler to emit references to a given symbol as if it was some other symbol (should be really straight forward). 3. Linker have more data and is able to achieve colossal size savings, essentially running through the same algorithm before actually linking things. Again it's easily separable step and can be an opt-in. 4. Ironically the same exact thing works with any kind of immutable data structures. It looks like string pooling is superseded by this proposal. Thoughts? -- Dmitry Olshansky
Apr 08 2012
Am Sun, 08 Apr 2012 15:01:56 +0400 schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:I think it's been ages since I meant to ask why nobody (as in compiler vendors) does what I think is rather simple optimization. In the short term the plan is to introduce a "link-time" flavored optimization at code generation or (better) link step. For simplicity let's assume compiler does all of the following during generation of an object file. 1. Every time a function is generated (or pretty much any symbol) not only a size calculated but also a checksum* of it's data. (If we go for link-time optimization we should find a place to stick it to in the object file) 2. Compiler maintains a hash-table of symbol_size ---> list( ~ array) of pairs (references to data, checksum) of all symbols with given size. Call it a duplicate table. Every function generated and more generally global immutable data should end up there. 3. After any function was generated compiler checks an entry in the duplicate table that matches size, followed by matching checksum and only then (if required) doing a straight memcmp. If it happens that there is a match compiler just throws generated code away and _aliases_ it's symbol to that of a matched entry. (so there has to be an alias table if there isn't one already) *Yes, checksum. I think it should be real simple and easy to parallel hash function. The original checksum is no reliable but some amount balancing and profiling are obviously required when picking this function. Applicability: It's not only const-immutable bloat, it can be alleviated with inout. Yet there are plenty of places the exact same code is being generated: e.g. sealed containers of int vs uint, std.find on dchar[] vs uint[]/int[] an so on. In general, the coarse grained parametrization is the root of all evil and it is inevitable since we are just humans after all. Notes: 1. If we do checksum calculation on the fly during codegen it gets at virtually no cost as the data is in CPU data cache. Preliminary version can avoid hacking this part of backend though. 2. By _alias_ I mean the ability of compiler to emit references to a given symbol as if it was some other symbol (should be really straight forward). 3. Linker have more data and is able to achieve colossal size savings, essentially running through the same algorithm before actually linking things. Again it's easily separable step and can be an opt-in. 4. Ironically the same exact thing works with any kind of immutable data structures. It looks like string pooling is superseded by this proposal. Thoughts?Thoughts? Nothing much. I thought of that a while ago, but as an external program, that finds function calls by disassembling and removing dead/duplicate code. So I agree with you. A similar feature is a CTFE cache (or general code cache) that checksums a function's source code and gets the compiled version from a cache. Template bloat could be especially important to 'fix' on embedded systems. But I don't consider it important enough at the moment. :/ Let's wait till the bugs and important features are implemented or hack the compiler ourselves. -- Marco
Apr 08 2012
On 08.04.2012 16:37, Marco Leise wrote: [snip]Template bloat could be especially important to 'fix' on embedded systems.I think I this idea largely formed years ago when I was working with c++ on 8bit micros. You won't believe the amount of code size one can save by using one separate generic save-it-all-then-load-it-all prolog/epilog for all functions (and esp ISRs).Let's wait till the bugs and important features are implemented or hack the compiler ourselves.Let's hack the compiler ;) BTW I think it should be possible to apply the idea on the IR level not on the actual machine code. -- Dmitry Olshansky
Apr 08 2012
Am Sun, 08 Apr 2012 20:58:15 +0400 schrieb Dmitry Olshansky <dmitry.olsh gmail.com>:On 08.04.2012 16:37, Marco Leise wrote: [snip]I don't know compilers, but doesn't the IR level still distinguish between signed and unsigned for example, even though the generated machine code will be the same? Anyway that's what I had in mind with the caching of CTFE as well. To create a hash over the IR (or AST?) of a function (or anything else) and store the hash and the serialized IR into a file next to the object files. -- MarcoTemplate bloat could be especially important to 'fix' on embedded systems.I think I this idea largely formed years ago when I was working with c++ on 8bit micros. You won't believe the amount of code size one can save by using one separate generic save-it-all-then-load-it-all prolog/epilog for all functions (and esp ISRs).Let's wait till the bugs and important features are implemented or hack the compiler ourselves.Let's hack the compiler ;) BTW I think it should be possible to apply the idea on the IR level not on the actual machine code.
Apr 08 2012
On Sun, Apr 08, 2012 at 03:01:56PM +0400, Dmitry Olshansky wrote:I think it's been ages since I meant to ask why nobody (as in compiler vendors) does what I think is rather simple optimization. In the short term the plan is to introduce a "link-time" flavored optimization at code generation or (better) link step.This would be incompatible with how current (non-dmd) linkers work. But I do like the idea. Perhaps if it works well, other linkers will adopt it? (Just like how the gcc linker adopted duplicate template code elimination due to C++ templates.)For simplicity let's assume compiler does all of the following during generation of an object file. 1. Every time a function is generated (or pretty much any symbol) not only a size calculated but also a checksum* of it's data. (If we go for link-time optimization we should find a place to stick it to in the object file)We'd have to make sure the checksum doesn't end up in the final executable though, otherwise the bloat may negate any gains we've made.2. Compiler maintains a hash-table of symbol_size ---> list( ~ array) of pairs (references to data, checksum) of all symbols with given size. Call it a duplicate table. Every function generated and more generally global immutable data should end up there. 3. After any function was generated compiler checks an entry in the duplicate table that matches size, followed by matching checksum and only then (if required) doing a straight memcmp. If it happens that there is a match compiler just throws generated code away and _aliases_ it's symbol to that of a matched entry. (so there has to be an alias table if there isn't one already)I think you don't even need an alias table; IIRC the OS dynamic linker can easily handle symbols that have the same value (i.e. that point to the same place). All you have to do is to change the value of the duplicated symbols so that they all point to the same address. [...]Applicability: It's not only const-immutable bloat, it can be alleviated with inout. Yet there are plenty of places the exact same code is being generated: e.g. sealed containers of int vs uint, std.find on dchar[] vs uint[]/int[] an so on. In general, the coarse grained parametrization is the root of all evil and it is inevitable since we are just humans after all.I'm not sure I understand the last sentence there, but duplicate code elimination is definitely a big plus. It will also mean that we can use templates more freely without having to worry about template bloat.Notes: 1. If we do checksum calculation on the fly during codegen it gets at virtually no cost as the data is in CPU data cache. Preliminary version can avoid hacking this part of backend though. 2. By _alias_ I mean the ability of compiler to emit references to a given symbol as if it was some other symbol (should be really straight forward).Like I said, I think this isn't even necessary, if the compiler can just generate the same value for the duplicated symbols.3. Linker have more data and is able to achieve colossal size savings, essentially running through the same algorithm before actually linking things. Again it's easily separable step and can be an opt-in.This assumes the (maybe external) linker knows how to take advantage of the info. But IMO, linkers *should* be a lot smarter than they currently are, so I don't see a problem with this as long as a dumb linker will still produce a working executable (just without the space savings). Alternatively we can have an external pre-link tool that scans a given set of object files and eliminates duplicated code (by turning duplicated symbols into external references in all but one of the instances), before we hand the files off to the OS's native linker. Come to think of it, this might be a good way to experiment with this idea before we commit lots of effort into integrating it with a real linker.4. Ironically the same exact thing works with any kind of immutable data structures. It looks like string pooling is superseded by this proposal.[...] Not really... string pooling can take advantage of overlapping (sub)strings, but I don't think you can do that with code. But I think your idea has a lot of merit. I'm for making linkers smarter than they currently are. T -- It always amuses me that Windows has a Safe Mode during bootup. Does that mean that Windows is normally unsafe?
Apr 08 2012
Am Sun, 8 Apr 2012 07:18:26 -0700 schrieb "H. S. Teoh" <hsteoh quickfur.ath.cx>:We'd have to make sure the checksum doesn't end up in the final executable though, otherwise the bloat may negate any gains we've made.Executables (and object files) are made up mostly of sections, some of which are 'special cased' to contain the code, zero initialized data, thread local storage etc. and some user defined. The checksums would most probably end up in their own section, like is already happening for debug info or comments. Using a tool like strip you can remove any section by its name. Once a linker knows how to use the checksums, it would strip them by default. -- Marco
Apr 08 2012
On 08.04.2012 18:18, H. S. Teoh wrote: [snip]Easy the symbol size is in object file (obviously) but it surely not present in the executable. If I worth anything legacy formats have a plenty of slack space reserved for future. Nwo is the future :) [snip]1. Every time a function is generated (or pretty much any symbol) not only a size calculated but also a checksum* of it's data. (If we go for link-time optimization we should find a place to stick it to in the object file)We'd have to make sure the checksum doesn't end up in the final executable though, otherwise the bloat may negate any gains we've made.Nice to know.3. After any function was generated compiler checks an entry in the duplicate table that matches size, followed by matching checksum and only then (if required) doing a straight memcmp. If it happens that there is a match compiler just throws generated code away and _aliases_ it's symbol to that of a matched entry. (so there has to be an alias table if there isn't one already)I think you don't even need an alias table; IIRC the OS dynamic linker can easily handle symbols that have the same value (i.e. that point to the same place). All you have to do is to change the value of the duplicated symbols so that they all point to the same address.[...]It's easy - define a template on type T. Code it up. Now how many times you did consider that e.g. you can parametrize on the size of the type instead of the type itself? I'm making a point that it's almost always the case with sealed containers of PODs for instance. (now multiply the argument for total number of parameters)Applicability: It's not only const-immutable bloat, it can be alleviated with inout. Yet there are plenty of places the exact same code is being generated: e.g. sealed containers of int vs uint, std.find on dchar[] vs uint[]/int[] an so on. In general, the coarse grained parametrization is the root of all evil and it is inevitable since we are just humans after all.I'm not sure I understand the last sentence there, but duplicate code elimination is definitely a big plus. It will also mean that we can use templates more freely without having to worry about template bloat.OK, I'm not much into how *exactly* linker works these days. I know the basics though.Notes: 1. If we do checksum calculation on the fly during codegen it gets at virtually no cost as the data is in CPU data cache. Preliminary version can avoid hacking this part of backend though. 2. By _alias_ I mean the ability of compiler to emit references to a given symbol as if it was some other symbol (should be really straight forward).Like I said, I think this isn't even necessary, if the compiler can just generate the same value for the duplicated symbols.Sorry, it's just me running ahead of train somewhat. Basically once this initial version is in place I have one cool refinement for it in mind. For now we just need to keep the hash function transitive and associative for the gods sake. 128/64bit checksum please ;) -- Dmitry Olshansky4. Ironically the same exact thing works with any kind of immutable data structures. It looks like string pooling is superseded by this proposal.[...] Not really... string pooling can take advantage of overlapping (sub)strings, but I don't think you can do that with code. But I think your idea has a lot of merit. I'm for making linkers smarter than they currently are.
Apr 08 2012
On Sun, Apr 08, 2012 at 08:45:19PM +0400, Dmitry Olshansky wrote:On 08.04.2012 18:18, H. S. Teoh wrote: [snip]I agree! [...]We'd have to make sure the checksum doesn't end up in the final executable though, otherwise the bloat may negate any gains we've made.Easy the symbol size is in object file (obviously) but it surely not present in the executable. If I worth anything legacy formats have a plenty of slack space reserved for future. Nwo is the future :)Yeah, that's what I was thinking of. This would be a very big gain for the new AA implementation, for example. I wouldn't have to worry so much about template bloat if most of the instantiations are going to get merged anyway. :-) [...]It's easy - define a template on type T. Code it up. Now how many times you did consider that e.g. you can parametrize on the size of the type instead of the type itself? I'm making a point that it's almost always the case with sealed containers of PODs for instance. (now multiply the argument for total number of parameters)Applicability: It's not only const-immutable bloat, it can be alleviated with inout. Yet there are plenty of places the exact same code is being generated: e.g. sealed containers of int vs uint, std.find on dchar[] vs uint[]/int[] an so on. In general, the coarse grained parametrization is the root of all evil and it is inevitable since we are just humans after all.I'm not sure I understand the last sentence there, but duplicate code elimination is definitely a big plus. It will also mean that we can use templates more freely without having to worry about template bloat.[...] And what would the cool refinement be? :-) T -- It's bad luck to be superstitious. -- YHLNot really... string pooling can take advantage of overlapping (sub)strings, but I don't think you can do that with code. But I think your idea has a lot of merit. I'm for making linkers smarter than they currently are.Sorry, it's just me running ahead of train somewhat. Basically once this initial version is in place I have one cool refinement for it in mind. For now we just need to keep the hash function transitive and associative for the gods sake. 128/64bit checksum please ;)
Apr 08 2012
On 08.04.2012 21:24, H. S. Teoh wrote:Yeah, that's what I was thinking of. This would be a very big gain for the new AA implementation, for example. I wouldn't have to worry so much about template bloat if most of the instantiations are going to get merged anyway. :-)Right the advantage is that one doesn't have to use tricks. One can simply assume the compiler is doing it's job. That's what happened with inlining some 10+ years ago.[...]OK, you sort of forced my hand. I hope you have been warned about spoilers before ;) The refinement is merging prefixes and suffixes of course. And for that one needs to calculate hashes for all of prefixes and all of suffixes. I will define _all_ later on. First observation is that if you calculated partial checksums for prefixes you have sums for all suffixes and vise-versa. Namely: //prefix ends on i, suffix start on i sum_prefix[i] = total - sum_suffix[i]; that is derived from the fact that: total = sum_prefix[i] + sum_suffix[i]; (taking into account the properties of +/- in the finite field) Now take the original idea, substitute checksum with an array of partial sums and lengths (btw lengths follow the same rule of prefix<->suffix) and you know what this refinement all about. In fact this all was easy, the trick is fitting this into the time frame of compilation. To that end one should do full duplicate elimination first then mess with merging. Then we see that generally we are about to do O((M1*...*Mn)^2) work where n - is number of functions, Mi - number of prefixes (= suffixes) we defined for each. The constant C in front of (M1...Mn)^2 here is not that big - it's comparing checksums & lengths but *sometimes* memcmp still keep in mind that the number of all functions is big. So we have to get around this monstrous workload. At the moment I see the only way is doing this is use coarse grained prefixes and introduce some threshold on maximum number of prefixes we account for. That about defining above mentioned _all_ for prefixes. Coarse grained means we do store partial checksums on a fixed block boundary (say every 16 or 32 bytes) if that aligns properly with instructions if not we just skip this prefix and move on. (this also hopefully limits memory usage on partial sums array) Another threshold is that we don't mess with partial sums for really BIG functions. (might be not needed) I think there is some space for other heuristics to try out but they are mostly along the same lines. P.S. Damn, I could have done a nice paper on that... too late :) -- Dmitry Olshansky[...] And what would the cool refinement be? :-)Not really... string pooling can take advantage of overlapping (sub)strings, but I don't think you can do that with code. But I think your idea has a lot of merit. I'm for making linkers smarter than they currently are.Sorry, it's just me running ahead of train somewhat. Basically once this initial version is in place I have one cool refinement for it in mind. For now we just need to keep the hash function transitive and associative for the gods sake. 128/64bit checksum please ;)
Apr 08 2012
On 4/8/12 1:49 PM, Dmitry Olshansky wrote:P.S. Damn, I could have done a nice paper on that... too late :)You may always do. Andrei
Apr 08 2012
On 08.04.2012 22:49, Dmitry Olshansky wrote:The refinement is merging prefixes and suffixes of course. And for that one needs to calculate hashes for all of prefixes and all of suffixes. I will define _all_ later on. First observation is that if you calculated partial checksums for prefixes you have sums for all suffixes and vise-versa. Namely: //prefix ends on i, suffix start on i sum_prefix[i] = total - sum_suffix[i]; that is derived from the fact that: total = sum_prefix[i] + sum_suffix[i]; (taking into account the properties of +/- in the finite field)Mm btw there is one step from here to use it on arbitrary common slices (i, j) where i < j: slice_sum(i, j) = sum_prefix[j] - sum_prefix[i] P.S. (Goes for walk citing a dynamic programming maxim) -- Dmitry Olshansky
Apr 08 2012
"Dmitry Olshansky" <dmitry.olsh gmail.com> wrote in message news:jlsmka$22ce$1 digitalmars.com...The refinement is merging prefixes and suffixes of course. And for that one needs to calculate hashes for all of prefixes and all of suffixes. I will define _all_ later on.I think you'll find that this is better done in the compiler instead of the linker. Merging prefixes is problematic because at some point you will need to work out which tail to execute, so you'll always need to modify the generated code. Merging suffixes is easier, you can merge all returning blocks with identical code, and then merge all blocks that always jump to the same blocks, etc. This will need to happen after code generation if you want to merge int/uint code, which would be difficult in dmd's back end. Merging functions with identical bodies is of course much easier, and can be done in the linker without needing to modify any code (just the relocations).
Apr 08 2012
On 09.04.2012 5:11, Daniel Murphy wrote:"Dmitry Olshansky"<dmitry.olsh gmail.com> wrote in message news:jlsmka$22ce$1 digitalmars.com..."Easy": just add a hidden pointer argument to functions that have merged prefix (call it dispatch). The prefix part of code is followed by an indirect jump to this pointer. Compiler arranges so that every time function is called the correct dispatch address is passed behind the scenes. BTW there are no extra checks and such it's one naked indirect jump, and it's totally predictable unlike say switch jump. (well unless you use a few copy-paste-susceptible functions in the same loop that turn out to have prefixes merged) It still implies that prefix merging should be applied with more care then suffix merging. Once this tested and working even merging arbitrary parts of functions is doable with this approach.The refinement is merging prefixes and suffixes of course. And for that one needs to calculate hashes for all of prefixes and all of suffixes. I will define _all_ later on.I think you'll find that this is better done in the compiler instead of the linker. Merging prefixes is problematic because at some point you will need to work out which tail to execute, so you'll always need to modify the generated code.Merging suffixes is easier, you can merge all returning blocks with identical code, and then merge all blocks that always jump to the same blocks, etc. This will need to happen after code generation if you want to merge int/uint code, which would be difficult in dmd's back end.Any chance to fit this into IR-->CodeGen step? Like use alternative comparison then memcmp. Or better do basically the same algorithm but on map!(....)(IR) that morphs things so that some identical ops (e.g. uint/int == and !=) are considered the same. In fact this might be even faster then generating useless machine code!Merging functions with identical bodies is of course much easier, and can be done in the linker without needing to modify any code (just the relocations).-- Dmitry Olshansky
Apr 09 2012
Le 08/04/2012 16:18, H. S. Teoh a écrit :On Sun, Apr 08, 2012 at 03:01:56PM +0400, Dmitry Olshansky wrote:Actually, in C++ (as well as D), the added benefit would be a greatly improved compilation speed, wouldn't it ? I bet if the idea works in D and proves increased compilation, compiler writers would be very compelled to implement it in C++.I think it's been ages since I meant to ask why nobody (as in compiler vendors) does what I think is rather simple optimization. In the short term the plan is to introduce a "link-time" flavored optimization at code generation or (better) link step.This would be incompatible with how current (non-dmd) linkers work. But I do like the idea. Perhaps if it works well, other linkers will adopt it? (Just like how the gcc linker adopted duplicate template code elimination due to C++ templates.) T
Apr 08 2012
On 04/09/12 08:21, Somedude wrote:Le 08/04/2012 16:18, H. S. Teoh a écrit :They already do. It's a very simple and trivial optimization, the question is only about programmer expectations. Every (memory) object having an unique address *is* a valuable feature with clear benefits. (C++ has functions as non-objects, that's why the compilers can get away with the optimization) Note that that does not actually mean that everything has to be placed at an unique address -- it only needs to behave *AS IF*, as long as the program can't tell the difference. On 04/09/12 02:59, Daniel Murphy wrote:On Sun, Apr 08, 2012 at 03:01:56PM +0400, Dmitry Olshansky wrote:Actually, in C++ (as well as D), the added benefit would be a greatly improved compilation speed, wouldn't it ? I bet if the idea works in D and proves increased compilation, compiler writers would be very compelled to implement it in C++.I think it's been ages since I meant to ask why nobody (as in compiler vendors) does what I think is rather simple optimization. In the short term the plan is to introduce a "link-time" flavored optimization at code generation or (better) link step.This would be incompatible with how current (non-dmd) linkers work. But I do like the idea. Perhaps if it works well, other linkers will adopt it? (Just like how the gcc linker adopted duplicate template code elimination due to C++ templates.) T"Artur Skawina" <art.08.09 gmail.com> wrote in message news:mailman.1480.1333900846.4860.digitalmars-d puremagic.com...Nice idea. Given todays amounts of alignment noops emitted it would usually be completely free. But I now think the optimization would be ok, and should even on by default for the case where the identical code sequence was generated from an identical token sequence. That would handle the template bloat issue while avoiding most of the problems; having non-unique addresses for this case should be harmless and would just need to be properly documented. It's only the random-completely-unrelated-function-replacement that is problematic - think such functions randomly appearing in the call chain, confusing both downstream code and programmers looking at backtraces or perf profiles, and breakpoints that magically appear out of nowhere at random. arturNote that my point is just that the compiler needs to emit a dummy so that the addresses remain unique, eg module.f!uint: jmp module.f!intOr use a nop slide before the start of the function. Since we're modifying the object file format anyway, it would be trivial for the compiler to mark functions which have their address taken as needing a unique address.
Apr 09 2012
On Mon, Apr 09, 2012 at 08:21:08AM +0200, Somedude wrote:Le 08/04/2012 16:18, H. S. Teoh a écrit :Exactly my point. I *want* to give incentive to toolchain devs to add these kinds of enhancements to linkers in general. T -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? -- Michael BeiblOn Sun, Apr 08, 2012 at 03:01:56PM +0400, Dmitry Olshansky wrote:Actually, in C++ (as well as D), the added benefit would be a greatly improved compilation speed, wouldn't it ? I bet if the idea works in D and proves increased compilation, compiler writers would be very compelled to implement it in C++.I think it's been ages since I meant to ask why nobody (as in compiler vendors) does what I think is rather simple optimization. In the short term the plan is to introduce a "link-time" flavored optimization at code generation or (better) link step.This would be incompatible with how current (non-dmd) linkers work. But I do like the idea. Perhaps if it works well, other linkers will adopt it? (Just like how the gcc linker adopted duplicate template code elimination due to C++ templates.) T
Apr 09 2012
On 04/08/12 13:01, Dmitry Olshansky wrote:3. After any function was generated compiler checks an entry in the duplicate table that matches size, followed by matching checksum and only then (if required) doing a straight memcmp. If it happens that there is a match compiler just throws generated code away and _aliases_ it's symbol to that of a matched entry. (so there has to be an alias table if there isn't one already)[...]Thoughts?Don't forget that this needs to work: static auto f(T)(T a) { return a; } assert(cast(void*)&f!int!=cast(void*)&f!uint); artur
Apr 08 2012
On 08.04.2012 18:21, Artur Skawina wrote:On 04/08/12 13:01, Dmitry Olshansky wrote:A reference to spec page plz. -- Dmitry Olshansky3. After any function was generated compiler checks an entry in the duplicate table that matches size, followed by matching checksum and only then (if required) doing a straight memcmp. If it happens that there is a match compiler just throws generated code away and _aliases_ it's symbol to that of a matched entry. (so there has to be an alias table if there isn't one already)[...]Thoughts?Don't forget that this needs to work: static auto f(T)(T a) { return a; } assert(cast(void*)&f!int!=cast(void*)&f!uint); artur
Apr 08 2012
On 04/08/12 17:20, Dmitry Olshansky wrote:On 08.04.2012 18:21, Artur Skawina wrote:On 04/08/12 13:01, Dmitry Olshansky wrote:3. After any function was generated compiler checks an entry in the duplicate table that matches size, followed by matching checksum and only then (if required) doing a straight memcmp. If it happens that there is a match compiler just throws generated code away and _aliases_ it's symbol to that of a matched entry. (so there has to be an alias table if there isn't one already)[...]Thoughts?Don't forget that this needs to work: static auto f(T)(T a) { return a; } assert(cast(void*)&f!int!=cast(void*)&f!uint);A reference to spec page plz.A reference to *a* D spec, please... There isn't one, but that does not mean that common sense does not need to apply. Do you really suggest making different template instantiations identical, just because the compiler happened to emit the same code? The situation is not very different from const int a = 1; const uint b = 1; assert(&a!=&b); The user can take the addresses, compare them, use as AA keys, set breakpoints etc. Note that my point is just that the compiler needs to emit a dummy so that the addresses remain unique, eg module.f!uint: jmp module.f!int that only is used when taking the address. Even calling f!int() instead of the uint version could be acceptable, as long as there is a compiler option to turn this optimization off (think breakpoints). artur
Apr 08 2012
On 4/8/12 10:59 AM, Artur Skawina wrote:On 04/08/12 17:20, Dmitry Olshansky wrote:Doesn't apply to C++. AndreiOn 08.04.2012 18:21, Artur Skawina wrote:On 04/08/12 13:01, Dmitry Olshansky wrote:3. After any function was generated compiler checks an entry in the duplicate table that matches size, followed by matching checksum and only then (if required) doing a straight memcmp. If it happens that there is a match compiler just throws generated code away and _aliases_ it's symbol to that of a matched entry. (so there has to be an alias table if there isn't one already)[...]Thoughts?Don't forget that this needs to work: static auto f(T)(T a) { return a; } assert(cast(void*)&f!int!=cast(void*)&f!uint);A reference to spec page plz.A reference to *a* D spec, please... There isn't one, but that does not mean that common sense does not need to apply.
Apr 08 2012
On 04/08/12 18:14, Andrei Alexandrescu wrote:On 4/8/12 10:59 AM, Artur Skawina wrote:Well, let's say it isn't specified. "Functions-are-not-objects" allows the compilers that merge the bodies to get away with it, because it's not explicitly illegal. But that does not mean that returning a different random pointer, which just happens to point to another identical code sequence, is a good idea. On 04/08/12 18:30, Dmitry Olshansky wrote:On 04/08/12 17:20, Dmitry Olshansky wrote:Doesn't apply to C++.On 08.04.2012 18:21, Artur Skawina wrote:On 04/08/12 13:01, Dmitry Olshansky wrote:3. After any function was generated compiler checks an entry in the duplicate table that matches size, followed by matching checksum and only then (if required) doing a straight memcmp. If it happens that there is a match compiler just throws generated code away and _aliases_ it's symbol to that of a matched entry. (so there has to be an alias table if there isn't one already)[...]Thoughts?Don't forget that this needs to work: static auto f(T)(T a) { return a; } assert(cast(void*)&f!int!=cast(void*)&f!uint);A reference to spec page plz.A reference to *a* D spec, please... There isn't one, but that does not mean that common sense does not need to apply.There is common sense and that is (in my book): don't tie up compiler's hands for no benefit.In this case the benefit from not requiring unique function addresses isn't very large (it only matters iff the address is taken). Hmm, as long as the returned pointer points to functionally identical code, often everything will still be fine. In other cases doing this optimization *at all* will cause trouble. And i think it should be done (the benefits are obvious), An "unique" function attribute would be the solution; that would let us eat the cake, except when we want to have it. :) artur
Apr 08 2012
On 08.04.2012 19:59, Artur Skawina wrote:On 04/08/12 17:20, Dmitry Olshansky wrote:Yeah, sorry.On 08.04.2012 18:21, Artur Skawina wrote:On 04/08/12 13:01, Dmitry Olshansky wrote:3. After any function was generated compiler checks an entry in the duplicate table that matches size, followed by matching checksum and only then (if required) doing a straight memcmp. If it happens that there is a match compiler just throws generated code away and _aliases_ it's symbol to that of a matched entry. (so there has to be an alias table if there isn't one already)[...]Thoughts?Don't forget that this needs to work: static auto f(T)(T a) { return a; } assert(cast(void*)&f!int!=cast(void*)&f!uint);A reference to spec page plz.A reference to *a* D spec, please...There isn't one, but that does not mean that common sense does not need to apply.There is common sense and that is (in my book): don't tie up compiler's hands for no benefit.Do you really suggest making different template instantiations identical, just because the compiler happened to emit the same code? The situation is not very different from const int a = 1; const uint b = 1; assert(&a!=&b);I wouldn't expect this assert to always hold. Moreover (all things being equal) I would expect taking address of constant integers a poor practice.The user can take the addresses, compare them, use as AA keys, set breakpoints etc.Yes, that's what I call poor practice ( I mean function pointer as AA _key_, seriously?). As for breakpoints, obviously one debugs non-optimized program (at least most of the time).Note that my point is just that the compiler needs to emit a dummy so that the addresses remain unique, eg module.f!uint: jmp module.f!intCould work as a conservative option. But I don't think it's justified.that only is used when taking the address. Even calling f!int() instead of the uint version could be acceptable, as long as there is a compiler option to turn this optimization off (think breakpoints).Yup, that's what optimizations are. -- Dmitry Olshansky
Apr 08 2012
"Artur Skawina" <art.08.09 gmail.com> wrote in message news:mailman.1480.1333900846.4860.digitalmars-d puremagic.com...Note that my point is just that the compiler needs to emit a dummy so that the addresses remain unique, eg module.f!uint: jmp module.f!intOr use a nop slide before the start of the function. Since we're modifying the object file format anyway, it would be trivial for the compiler to mark functions which have their address taken as needing a unique address.
Apr 08 2012
On Mon, Apr 09, 2012 at 10:59:26AM +1000, Daniel Murphy wrote:"Artur Skawina" <art.08.09 gmail.com> wrote in message news:mailman.1480.1333900846.4860.digitalmars-d puremagic.com...[...] Why is it so important to have unique addresses for functions? T -- BREAKFAST.COM halted...Cereal Port Not Responding. -- YHLNote that my point is just that the compiler needs to emit a dummy so that the addresses remain unique, eg module.f!uint: jmp module.f!intOr use a nop slide before the start of the function. Since we're modifying the object file format anyway, it would be trivial for the compiler to mark functions which have their address taken as needing a unique address.
Apr 08 2012
Am Sun, 8 Apr 2012 19:14:22 -0700 schrieb "H. S. Teoh" <hsteoh quickfur.ath.cx>:On Mon, Apr 09, 2012 at 10:59:26AM +1000, Daniel Murphy wrote:I would like to know that use case as well, especially since I learned coding with the "copy&paste code is bad" phrase. It just seems odd to me to expect any constant data/code to have unique addresses when they can be merged or overlapped. -- Marco"Artur Skawina" <art.08.09 gmail.com> wrote in message news:mailman.1480.1333900846.4860.digitalmars-d puremagic.com...[...] Why is it so important to have unique addresses for functions? TNote that my point is just that the compiler needs to emit a dummy so that the addresses remain unique, eg module.f!uint: jmp module.f!intOr use a nop slide before the start of the function. Since we're modifying the object file format anyway, it would be trivial for the compiler to mark functions which have their address taken as needing a unique address.
Apr 08 2012
"H. S. Teoh" <hsteoh quickfur.ath.cx> wrote in message news:mailman.1518.1333937643.4860.digitalmars-d puremagic.com...Why is it so important to have unique addresses for functions?Just because I can't think of a use case doesn't mean nobody is relying on it! But I guess there really isn't one.
Apr 09 2012
On Mon, Apr 09, 2012 at 11:58:01PM +1000, Daniel Murphy wrote:"H. S. Teoh" <hsteoh quickfur.ath.cx> wrote in message news:mailman.1518.1333937643.4860.digitalmars-d puremagic.com...[...] Somebody brought up the matter of stacktraces. Which could be a valid concern, I suppose, although I'm tempted to just say, use a non-optimized build for debugging purposes. (But I suppose that is arguable.) T -- Heuristics are bug-ridden by definition. If they didn't have bugs, they'd be algorithms.Why is it so important to have unique addresses for functions?Just because I can't think of a use case doesn't mean nobody is relying on it! But I guess there really isn't one.
Apr 09 2012
Am Sun, 08 Apr 2012 16:21:14 +0200 schrieb Artur Skawina <art.08.09 gmail.com>:On 04/08/12 13:01, Dmitry Olshansky wrote:Do you actually rely on that behavior? It is the same as asking this to work, I think: string a = "abc"; string b = "abcdef"; assert(a.ptr !is b.ptr); There should be ways to logically check the unequality of your two functions, not by comparing their memory addresses. -- Marco3. After any function was generated compiler checks an entry in the duplicate table that matches size, followed by matching checksum and only then (if required) doing a straight memcmp. If it happens that there is a match compiler just throws generated code away and _aliases_ it's symbol to that of a matched entry. (so there has to be an alias table if there isn't one already)[...]Thoughts?Don't forget that this needs to work: static auto f(T)(T a) { return a; } assert(cast(void*)&f!int!=cast(void*)&f!uint); artur
Apr 08 2012
On 4/8/2012 4:01 AM, Dmitry Olshansky wrote:I think it's been ages since I meant to ask why nobody (as in compiler vendors) does what I think is rather simple optimization.I worked out how to do it a while ago, but there's been no time to implement it. (You can't do a memcmp because of all the relocations.) The main difficulty is not being able to modify the linker. So you're pretty much limited to what the compiler is able to do before linking. D does allow the compiler to deal with all the modules at compile time, so this helps a lot.
Apr 08 2012
On 08.04.2012 22:51, Walter Bright wrote:On 4/8/2012 4:01 AM, Dmitry Olshansky wrote:Yes, I thought there must have been some problem with it.I think it's been ages since I meant to ask why nobody (as in compiler vendors) does what I think is rather simple optimization.I worked out how to do it a while ago, but there's been no time to implement it. (You can't do a memcmp because of all the relocations.)The main difficulty is not being able to modify the linker. So you're pretty much limited to what the compiler is able to do before linking. D does allow the compiler to deal with all the modules at compile time, so this helps a lot.I'm thinking what if we kind of do it as an extension so that your linker (e.g. optlink) knows about these checksums (or whatever you use instead) while normal linker just works the way it always did. -- Dmitry Olshansky
Apr 08 2012
On Sun, Apr 08, 2012 at 10:56:43PM +0400, Dmitry Olshansky wrote:On 08.04.2012 22:51, Walter Bright wrote:[...][...] I agree. Either that, or have an external utility for doing it. I must say that IMO current linker technology is quite dumb, and no harm could come from experimenting with smarter linkers. Things like duplicate template instantiation elimination have been introduced due to the influence of C++; I don't see why D shouldn't introduce its own advancements to linkers too. :-) T -- Laissez-faire is a French term commonly interpreted by Conservatives to mean 'lazy fairy,' which is the belief that if governments are lazy enough, the Good Fairy will come down from heaven and do all their work for them.The main difficulty is not being able to modify the linker. So you're pretty much limited to what the compiler is able to do before linking. D does allow the compiler to deal with all the modules at compile time, so this helps a lot.I'm thinking what if we kind of do it as an extension so that your linker (e.g. optlink) knows about these checksums (or whatever you use instead) while normal linker just works the way it always did.
Apr 08 2012