D - Inlining
- Helmut Leitner (11/11) Apr 21 2003 What do we know about inlining except that the compiler
- Matthew Wilson (17/28) Apr 21 2003 afaik, it is entirely up to the compiler, which is where it should be in
- Ilya Minkov (4/24) Apr 21 2003 Why not inline(always), inline(prefer), inline(never),
- Matthew Wilson (3/27) Apr 21 2003 Sounds ok to me
- Walter (5/13) Apr 24 2003 Think of inlining like the obsolete register keyword in C.
- Mark T (6/11) Apr 25 2003 I agree, in the future most D compilers could have various compile-for-s...
- Walter (9/15) Apr 25 2003 compile-for-speed and
- Scott Wood (16/20) Apr 27 2003 It'd still be nice to have a way of explicitly saying that a function
- Ilya Minkov (8/18) Apr 28 2003 Who says the register keyword is useless?
- Walter (22/41) May 03 2003 I suspect those functions are heavilly dependent on how a *particular*
- Scott Wood (41/66) May 04 2003 Not particularly, at least in the case of the scheduler. The
- Walter (24/66) May 07 2003 The compiler does not optimize inline assembly that you write. Therefore...
- Scott Wood (81/119) May 07 2003 I suppose, though it'd be a little awkward to use the assembler just
- Walter (41/152) May 08 2003 I'd agree with that.
- C (20/41) May 07 2003 If that means what I think is intended, should 'break' be more
- Scott Wood (48/117) May 08 2003 Except that in this case, using inline assembler would have made it
- Walter (43/121) May 09 2003 a
- Scott Wood (72/132) May 09 2003 For the default case, sure. I'll wait until compilers have a full,
- Ilya Minkov (9/12) May 10 2003 Why are you using GAS? You can use NASM (or maybe FASM) instead! Both
- Nic Tiger (29/41) May 10 2003 I did find reliable way to use NASM with Digital Mars for Win32 and DOSX
What do we know about inlining except that the compiler will do it when it feels so? Is there a guarantee that a simple macro-like definition like void MemClear(char *p,int size) { memset(p,0,size); } will be inlined? What if this goes through multiple levels? -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
Apr 21 2003
afaik, it is entirely up to the compiler, which is where it should be in almost all cases. I think I remember there being discussion about the use of the inline keyword as something to _force_ the compiler to inline, which I kind of like, but maybe using that keyword is bad, since all the C++ programmers will use it everywhere, which may not be appropriate. force_inline or forceinline might be better, as they're uglier, or even forceinline { void MemClear(char *p,int size) { memset(p,0,size); } } which would be unambiguous and quite obvious "Helmut Leitner" <helmut.leitner chello.at> wrote in message news:3EA3AEB3.DDC28183 chello.at...What do we know about inlining except that the compiler will do it when it feels so? Is there a guarantee that a simple macro-like definition like void MemClear(char *p,int size) { memset(p,0,size); } will be inlined? What if this goes through multiple levels? -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
Apr 21 2003
Why not inline(always), inline(prefer), inline(never), inline(SomeConstantComparedToStandardizedInlinabilityIndex)? Like the way version already works? Matthew Wilson wrote:afaik, it is entirely up to the compiler, which is where it should be in almost all cases. I think I remember there being discussion about the use of the inline keyword as something to _force_ the compiler to inline, which I kind of like, but maybe using that keyword is bad, since all the C++ programmers will use it everywhere, which may not be appropriate. force_inline or forceinline might be better, as they're uglier, or even forceinline { void MemClear(char *p,int size) { memset(p,0,size); } } which would be unambiguous and quite obvious
Apr 21 2003
Sounds ok to me "Ilya Minkov" <midiclub 8ung.at> wrote in message news:b81ms7$2vg8$1 digitaldaemon.com...Why not inline(always), inline(prefer), inline(never), inline(SomeConstantComparedToStandardizedInlinabilityIndex)? Like the way version already works? Matthew Wilson wrote:afaik, it is entirely up to the compiler, which is where it should be in almost all cases. I think I remember there being discussion about the use of the inline keyword as something to _force_ the compiler to inline, which I kind of like, but maybe using that keyword is bad, since all the C++ programmers will use it everywhere, which may not be appropriate. force_inline or forceinline might be better, as they're uglier, or even forceinline { void MemClear(char *p,int size) { memset(p,0,size); } } which would be unambiguous and quite obvious
Apr 21 2003
"Helmut Leitner" <helmut.leitner chello.at> wrote in message news:3EA3AEB3.DDC28183 chello.at...What do we know about inlining except that the compiler will do it when it feels so? Is there a guarantee that a simple macro-like definition like void MemClear(char *p,int size) { memset(p,0,size); } will be inlined? What if this goes through multiple levels?Think of inlining like the obsolete register keyword in C. Whether obvious inlining is done or not is a quality of implementation issue, not a language issue.
Apr 24 2003
I agree, in the future most D compilers could have various compile-for-speed and compile-for-size switches for various environments ( ex: small embedded targets ) The design of the language should also allow for "Global System Analysis" see JOOP article May 2001 or look for similar info at http://smarteiffel.loria.fr/ http://smarteiffel.loria.fr/papers/papers.htmlwill be inlined? What if this goes through multiple levels?Think of inlining like the obsolete register keyword in C. Whether obvious inlining is done or not is a quality of implementation issue, not a language issue.
Apr 25 2003
"Mark T" <Mark_member pathlink.com> wrote in message news:b8bb7j$d7b$1 digitaldaemon.com...I agree, in the future most D compilers could have variouscompile-for-speed andcompile-for-size switches for various environments ( ex: small embeddedtargets)Yes.The design of the language should also allow for "Global System Analysis"seeJOOP article May 2001 or look for similar info athttp://smarteiffel.loria.fr/http://smarteiffel.loria.fr/papers/papers.htmlD's design does allow for extensive inter-module analysis, although DMD makes no attempt at it.
Apr 25 2003
On Thu, 24 Apr 2003 13:03:22 -0700, Walter <walter digitalmars.com> wrote:Think of inlining like the obsolete register keyword in C. Whether obvious inlining is done or not is a quality of implementation issue, not a language issue.It'd still be nice to have a way of explicitly saying that a function either must or must not be inlined. For example, the dynamic linker in the GNU libc will break if certain functions are not inlined, because the relocation has not yet been done. The schedule() function in Linux will break on sparc (and perhaps some other platforms) if it is inlined, if you switch to a task that entered the scheduler via a different containing function. As for the analogy with the register keyword, GCC extends that to allow you to explicitly place variables in specific registers, which is useful in conjunction with assembly code. The uselessness of the original keyword does not mean that anything similar is also useless. Neither of these are things you'd need very often, but when you do, it'd be really unpleasant if they weren't there. After all, D claims to support "Down and dirty programming". :-) -Scott
Apr 27 2003
Who says the register keyword is useless? I remember some case of some guys using a fairly recent GCC, where they could raise performance by 20% by putting in the simple register hint in a couple of spots. While the compilers are getting smart, they don't know anything particular about the program's typical input values, as the programmer usually does. -i. Scott Wood wrote:As for the analogy with the register keyword, GCC extends that to allow you to explicitly place variables in specific registers, which is useful in conjunction with assembly code. The uselessness of the original keyword does not mean that anything similar is also useless. Neither of these are things you'd need very often, but when you do, it'd be really unpleasant if they weren't there. After all, D claims to support "Down and dirty programming". :-) -Scott
Apr 28 2003
"Scott Wood" <scott buserror.net> wrote in message news:slrnbaoctt.3jp.scott ti.buserror.net...On Thu, 24 Apr 2003 13:03:22 -0700, Walter <walter digitalmars.com> wrote:I suspect those functions are heavilly dependent on how a *particular* compiler generates code for that. Depending on that is going outside of the language definition. It makes successful operation of the code overly sensitive to particular compiler versions, etc. (Some linux kernel developers are open about the kernel code being heavilly dependent on how a particular revision of GCC generates code.) You could as easilly write code in D that depends on a particular implementation of D, though with D's support for inline assembler I'd argue that is unnecessary.Think of inlining like the obsolete register keyword in C. Whether obvious inlining is done or not is a quality of implementation issue, not a language issue.It'd still be nice to have a way of explicitly saying that a function either must or must not be inlined. For example, the dynamic linker in the GNU libc will break if certain functions are not inlined, because the relocation has not yet been done. The schedule() function in Linux will break on sparc (and perhaps some other platforms) if it is inlined, if you switch to a task that entered the scheduler via a different containing function.As for the analogy with the register keyword, GCC extends that to allow you to explicitly place variables in specific registers, which is useful in conjunction with assembly code. The uselessness of the original keyword does not mean that anything similar is also useless.Those features are not part of the C language; although they are part of GCC, they will not work with every version of GCC, and will not work with any other C compiler. Contrast that with D, which has defined support for inline assembler. Try doing some inline assembler work in GCC, then with D. I think you'll find it supported far better in D, despite GCC's extensions.Neither of these are things you'd need very often, but when you do, it'd be really unpleasant if they weren't there. After all, D claims to support "Down and dirty programming". :-)Those things are what the inline assembler is for, and D has very strong support for inline assembler. The C language itself has no support at all for inline assembler, and GCC's support for it is very weak and error-prone (for example, there's an arcane syntax you have to add to say which registers were read and which were written by each asm block - get that wrong, and your code will behave unpredictably. D, on the other hand, keeps track of that automatically).
May 03 2003
On Sat, 3 May 2003 14:38:10 -0700, Walter <walter digitalmars.com> wrote:"Scott Wood" <scott buserror.net> wrote in message news:slrnbaoctt.3jp.scott ti.buserror.net...Not particularly, at least in the case of the scheduler. The scheduler's only concern with inlining is that it the destination thread doesn't resume in the wrong inlined instance. The inline assembly is non-portable as well, but only because inline assembly is not part of C.It'd still be nice to have a way of explicitly saying that a function either must or must not be inlined. For example, the dynamic linker in the GNU libc will break if certain functions are not inlined, because the relocation has not yet been done. The schedule() function in Linux will break on sparc (and perhaps some other platforms) if it is inlined, if you switch to a task that entered the scheduler via a different containing function.I suspect those functions are heavilly dependent on how a *particular* compiler generates code for that.Depending on that is going outside of the language definition.That depends on what the language definition is. :-)It makes successful operation of the code overly sensitive to particular compiler versions, etc. (Some linux kernel developers are open about the kernel code being heavilly dependent on how a particular revision of GCC generates code.)Some bits have been, but it's mainly been due to Linux developers ignoring GCC's own rules for things like inline assembly constraints, or making assumptions about weird stuff like "inline" assembly outside of any function.Those things are what the inline assembler is for, and D has very strong support for inline assembler.How do you use the inline assembler to tell the compiler not to inline a certain function written in D, not assembly?The C language itself has no support at all for inline assembler, and GCC's support for it is very weak and error-prone (for example, there's an arcane syntax you have to add to say which registers were read and which were written by each asm block - get that wrong, and your code will behave unpredictably. D, on the other hand, keeps track of that automatically).Is there a way in D inline assembly to ask for a temporary register without mandating a specific one? How about specifying clobbers that aren't explicitly in the code, such as when calling a function with an unusual calling convention, or when switching threads? Also, one of the example code sequences is this: void *pc; asm { call L1 ; L1: ; pop EBX ; mov pc[EBP],EBX ; // pc now points to code at L1 } Why do you need to specify EBP when accessing pc? Shouldn't the compiler know what the best way to access pc is? It might want to get rid of the frame pointer, or it might want to keep it around in a register for use after the asm block, etc. GCC's inline assembly also has the sometimes desirable attribute that the compiler doesn't touch the instructions you specify, other than to schedule the block and substitute the things you asked it to. Will a D compiler be allowed to stick code in the middle of it, in order to satisfy symbolic references, or to schedule instructions? Is it allowed to optimize away mov instructions if it can get the data there on its own? Can it move memory accesses across the asm block? Usually, those sorts of things would be beneficial, but there should be a way to tell it not to do it. -Scott
May 04 2003
"Scott Wood" <scott buserror.net> wrote in message news:slrnbbalqd.ud.scott ti.buserror.net...Some bits have been, but it's mainly been due to Linux developers ignoring GCC's own rules for things like inline assembly constraints, or making assumptions about weird stuff like "inline" assembly outside of any function.The compiler does not optimize inline assembly that you write. Therefore, if you use the inline assembler to call a function, that function won't be inlined.Those things are what the inline assembler is for, and D has very strong support for inline assembler.How do you use the inline assembler to tell the compiler not to inline a certain function written in D, not assembly?error-proneThe C language itself has no support at all for inline assembler, and GCC's support for it is very weak andkeeps(for example, there's an arcane syntax you have to add to say which registers were read and which were written by each asm block - get that wrong, and your code will behave unpredictably. D, on the other hand,No. The idea is "what you write is what you get" with the inline assembler.track of that automatically).Is there a way in D inline assembly to ask for a temporary register without mandating a specific one?How about specifying clobbers that aren't explicitly in the code, such as when calling a function with an unusual calling convention, or when switching threads?Called functions must follow the normal register saving convention. If it is an unusual function that clobbers other registers, you'll need to save/restore them in the inline assembler.Also, one of the example code sequences is this: void *pc; asm { call L1 ; L1: ; pop EBX ; mov pc[EBP],EBX ; // pc now points to code at L1 } Why do you need to specify EBP when accessing pc? Shouldn't the compiler know what the best way to access pc is? It might want to get rid of the frame pointer, or it might want to keep it around in a register for use after the asm block, etc.The compiler doesn't do frame pointer optimization when the inline assembler is used, because the results of the inline assembler shouldn't be affected by whether optimization is on or off. If you want, though, you can use the 'naked' pseudo-op and write the entire function in assembler, and what you write is what you get.GCC's inline assembly also has the sometimes desirable attribute that the compiler doesn't touch the instructions you specify, other than to schedule the block and substitute the things you asked it to. Will a D compiler be allowed to stick code in the middle of it, in order to satisfy symbolic references, or to schedule instructions? Is it allowed to optimize away mov instructions if it can get the data there on its own? Can it move memory accesses across the asm block?The D compiler does not schedule, move around, optimize, or alter the inline assembler instructions. The assumption is that if the programmer is going to use inline assembler, the programmer knows exactly what he wants, and will write it that way. What you write is what you get.Usually, those sorts of things would be beneficial, but there should be a way to tell it not to do it.I guess I'm philosophically opposed to such things. I much prefer the straightforward approach of inline assembler that what you write is what you get. I also find it odd that gcc provides such things, yet still requires me to specify which registers were read/written for the simplest inline asm.
May 07 2003
On Wed, 7 May 2003 11:11:40 -0700, Walter <walter digitalmars.com> wrote:The compiler does not optimize inline assembly that you write. Therefore, if you use the inline assembler to call a function, that function won't be inlined.I suppose, though it'd be a little awkward to use the assembler just to call a function without it being inlined. Still, I'm a bit uncomfortable with the idea that the compiler's always right and cannot be corrected, even explicitly. I've seen GCC silently decide not to inline a function (on which inlining was requested) because it was "too big", even though it was just a large switch statement on a constant, which ended up being one or two instructions after optimization. Given that no compiler is going to make the right choice all the time, it's nice to be able to declare one's intent when there's a clear reason to do so.Which would defeat the purpose of using a special convention. For example, on a mutex implementation, one might want to make the contented case call a function that saves all registers, so that the common case doesn't have to spill any registers (other than whatever's need to test the mutex). Thread switching would also be slower on architectures with a reasonable number of registers if you have to manually save all of them just because you can't tell the compiler to save (or reconstruct) the 2 or 3 it might still care about. BTW, will there be any way to tell the inline assembler to put some code out-of-line? Something like: inline int lock_mutex(Mutex m) { int new = whatever_goes_in_there; asm { eax = 0; /* This tells the compiler to get a zero into eax, in whatever way it chooses. Maybe the caller (which is inlining this function) had one lying around in a register, and it can now choose to use eax for that variable. */ lock; cmpxchg [m.lock], new; jz failed; outofline { failed: /* I hope this label isn't visible outside of this instantiation of this assembly block... */ push ecx; push edx; call handle_failed; pop edx; pop ecx; return; /* This tells the compiler to exit the assembly block. Alternatively, a return label could be declared. */ } /* Tell the compiler that these registers were not, in fact, clobbered. It can't assume it automatically, though, since it has no idea what handle_failed might be doing to those values on the stack. Or, to save space, I may have buried those pushes into a wrapper assembly function instead, where the compiler probably won't see them. */ noclobber ecx, edx; /* Tell the compiler that, since this thing acts as a mutex, no memory accesses can be reordered across it. It's probably not necessary in this case, though, as it contains a function call. */ clobber memory; } }How about specifying clobbers that aren't explicitly in the code, such as when calling a function with an unusual calling convention, or when switching threads?Called functions must follow the normal register saving convention. If it is an unusual function that clobbers other registers, you'll need to save/restore them in the inline assembler.But it wouldn't affect the results, if the compiler handles the assignment to pc rather than the programmer. And what if I move to a compiler that *never* uses frame pointers? The code is now broken, because I had to make an assumption about what the compiler was doing with its registers. Plus, pc is probably going to be used soon after the asm block; why force it onto the stack and then back?Also, one of the example code sequences is this: void *pc; asm { call L1 ; L1: ; pop EBX ; mov pc[EBP],EBX ; // pc now points to code at L1 } Why do you need to specify EBP when accessing pc? Shouldn't the compiler know what the best way to access pc is? It might want to get rid of the frame pointer, or it might want to keep it around in a register for use after the asm block, etc.The compiler doesn't do frame pointer optimization when the inline assembler is used, because the results of the inline assembler shouldn't be affected by whether optimization is on or off.If you want, though, you can use the 'naked' pseudo-op and write the entire function in assembler, and what you write is what you get.Yes, but you can get that by using an external assembler as well. The point of inline assembly is to, well, be inline. :-)The D compiler does not schedule, move around, optimize, or alter the inline assembler instructions. The assumption is that if the programmer is going to use inline assembler, the programmer knows exactly what he wants, and will write it that way. What you write is what you get.The problem is that the programmer can't know exactly what he wants, without knowing some decisions that the compiler will make. GCC's syntax allows the programmer to tell the compiler exactly where to substitute those decisions. Removing the ability of the compiler to make the decisions will lead to slower code.I guess I'm philosophically opposed to such things. I much prefer the straightforward approach of inline assembler that what you write is what you get. I also find it odd that gcc provides such things, yet still requires me to specify which registers were read/written for the simplest inline asm.It's not really that odd, seeing as it needs those features to make up for its inability to parse the assembly code itself. However, those features end up granting the programmer more power than what they replace. -Scott
May 07 2003
"Scott Wood" <scott buserror.net> wrote in message news:slrnbbjfj0.1a2.scott ti.buserror.net...On Wed, 7 May 2003 11:11:40 -0700, Walter <walter digitalmars.com> wrote: I suppose, though it'd be a little awkward to use the assembler just to call a function without it being inlined.I'd agree with that.Still, I'm a bit uncomfortable with the idea that the compiler's always right and cannot be corrected, even explicitly. I've seen GCC silently decide not to inline a function (on which inlining was requested) because it was "too big", even though it was just a large switch statement on a constant, which ended up being one or two instructions after optimization. Given that no compiler is going to make the right choice all the time, it's nice to be able to declare one's intent when there's a clear reason to do so.I think that comes with the territory of using a high level language. If a particular routine is a major bottleneck in your program (and it does usually come down to one!), and you want to make the effort to tune it to the max, write it in inline assembler.it isHow about specifying clobbers that aren't explicitly in the code, such as when calling a function with an unusual calling convention, or when switching threads?Called functions must follow the normal register saving convention. IfThe Digital Mars C++ compiler can do this, but after having that capability for 15 years it just never proved out to be very useful.an unusual function that clobbers other registers, you'll need to save/restore them in the inline assembler.Which would defeat the purpose of using a special convention. For example, on a mutex implementation, one might want to make the contented case call a function that saves all registers, so that the common case doesn't have to spill any registers (other than whatever's need to test the mutex). Thread switching would also be slower on architectures with a reasonable number of registers if you have to manually save all of them just because you can't tell the compiler to save (or reconstruct) the 2 or 3 it might still care about. BTW, will there be any way to tell the inline assembler to put some code out-of-line? Something like: inline int lock_mutex(Mutex m) { int new = whatever_goes_in_there; asm { eax = 0; /* This tells the compiler to get a zero into eax, in whatever way it chooses. Maybe the caller (which is inlining this function) had one lying around in a register, and it can now choose to use eax for that variable. */lock; cmpxchg [m.lock], new; jz failed; outofline { failed: /* I hope this label isn't visible outside of this instantiation of this assembly block... */Yes, it is visible outside. All labels are in one scope per function, including the inline asm labels.push ecx; push edx; call handle_failed; pop edx; pop ecx; return; /* This tells the compiler to exit the assembly block. Alternatively, a return label could be declared. */Exit the assembly block? I don't know what you mean by that.} /* Tell the compiler that these registers were not, in fact, clobbered. It can't assume it automatically, though, since it has no idea what handle_failed might be doing to those values on the stack. Or, to save space, I may have buried those pushes into a wrapper assembly function instead, where the compiler probably won't see them. */ noclobber ecx, edx;That might be a reasonable addition./* Tell the compiler that, since this thing acts as a mutex, no memory accesses can be reordered across it. It's probably not necessary in this case, though, as it contains a function call. */ clobber memory;Unnecessary, as the inline assembler assumes memory is clobbered.} }assemblerAlso, one of the example code sequences is this: void *pc; asm { call L1 ; L1: ; pop EBX ; mov pc[EBP],EBX ; // pc now points to code at L1 } Why do you need to specify EBP when accessing pc? Shouldn't the compiler know what the best way to access pc is? It might want to get rid of the frame pointer, or it might want to keep it around in a register for use after the asm block, etc.The compiler doesn't do frame pointer optimization when the inlineaffectedis used, because the results of the inline assembler shouldn't beWhen using inline asm, you'll always run the risk of nonportability between compilers - after all, things like register conventions, calling conventions, etc., are not defined by the language. Only the syntax of the inline assembler is.by whether optimization is on or off.But it wouldn't affect the results, if the compiler handles the assignment to pc rather than the programmer. And what if I move to a compiler that *never* uses frame pointers? The code is now broken, because I had to make an assumption about what the compiler was doing with its registers.Plus, pc is probably going to be used soon after the asm block; why force it onto the stack and then back?Because the inline assembler assembles the code long before any register assignments are done.youIf you want, though, you can use the 'naked' pseudo-op and write the entire function in assembler, and whatI'm currently porting D to linux. Believe me, the inline assembler is a great boon to that. Just try converting MASM files to gas files! To me, using gas is like trying to write code looking in a mirror.write is what you get.Yes, but you can get that by using an external assembler as well. The point of inline assembly is to, well, be inline. :-)inlineThe D compiler does not schedule, move around, optimize, or alter thegoing toassembler instructions. The assumption is that if the programmer iswilluse inline assembler, the programmer knows exactly what he wants, andYou are correct in the abstract. In my experience, I believe the difference to be negligible. I profile code extensively to make it faster. The bottlenecks turn out to be maybe 30 lines of code out of a few thousand. Those I just write completely in hand-tuned inline assembler.write it that way. What you write is what you get.The problem is that the programmer can't know exactly what he wants, without knowing some decisions that the compiler will make. GCC's syntax allows the programmer to tell the compiler exactly where to substitute those decisions. Removing the ability of the compiler to make the decisions will lead to slower code.youI guess I'm philosophically opposed to such things. I much prefer the straightforward approach of inline assembler that what you write is whatrequires meget. I also find it odd that gcc provides such things, yet stillasm.to specify which registers were read/written for the simplest inlineIt's not really that odd, seeing as it needs those features to make up for its inability to parse the assembly code itself. However, those features end up granting the programmer more power than what they replace.I understand what you're driving at. It is heavilly integrated in with how gcc parses, optimizes, and generates code. I don't think that's a good thing to put in a language spec, as it may unnecessarilly constrain how the compiler is built.
May 08 2003
Walter wrote:"Scott Wood" <scott buserror.net> wrote in message news:slrnbbjfj0.1a2.scott ti.buserror.net...[-snip-]If that means what I think is intended, should 'break' be more approprate?return; /* This tells the compiler to exit the assembly block. Alternatively, a return label could be declared. */Exit the assembly block? I don't know what you mean by that.Agreed, though I would change the keyword, maybe 'retain' would be good, or the list could be added to the assembler declaration .. assembler: 'asm' '(' '!' noClobberList ')' '{' assemblerStatements '}' | 'asm' '{' assemblerStatements '}' ; noClobberList : regiterName ',' noClobberList | registerName ; such as ... asm (! ecx, edx ) { xor eax, eax push ecx call myFunc; } This is efficient, but its meaning is not immediately clear. C 2003/5/8} /* Tell the compiler that these registers were not, in fact, clobbered. It can't assume it automatically, though, since it has no idea what handle_failed might be doing to those values on the stack. Or, to save space, I may have buried those pushes into a wrapper assembly function instead, where the compiler probably won't see them. */ noclobber ecx, edx;That might be a reasonable addition.
May 07 2003
On Thu, 8 May 2003 10:50:21 -0700, Walter <walter digitalmars.com> wrote:"Scott Wood" <scott buserror.net> wrote in message news:slrnbbjfj0.1a2.scott ti.buserror.net...Except that in this case, using inline assembler would have made it worse. The code was expecting to have the switch(constant) optimized away to just the relevant case. Writing the containing functions in assembly would not have been realistic, as the to-be-inlined functions were used all over the source tree (they were used to move data to/from userspace).Still, I'm a bit uncomfortable with the idea that the compiler's always right and cannot be corrected, even explicitly. I've seen GCC silently decide not to inline a function (on which inlining was requested) because it was "too big", even though it was just a large switch statement on a constant, which ended up being one or two instructions after optimization. Given that no compiler is going to make the right choice all the time, it's nice to be able to declare one's intent when there's a clear reason to do so.I think that comes with the territory of using a high level language. If a particular routine is a major bottleneck in your program (and it does usually come down to one!), and you want to make the effort to tune it to the max, write it in inline assembler.It's a pretty small gain in this case, but what if it were a non-constant, that is almost guaranteed to be in some register before the asm statement?eax = 0; /* This tells the compiler to get a zero into eax, in whatever way it chooses. Maybe the caller (which is inlining this function) had one lying around in a register, and it can now choose to use eax for that variable. */The Digital Mars C++ compiler can do this, but after having that capability for 15 years it just never proved out to be very useful.I was more worried about it being visible throughout the file (or caller of the inline function), like it would have been in GCC, since there's no support for find-the-first-one-in-a-given-direction labels.lock; cmpxchg [m.lock], new; jz failed; outofline { failed: /* I hope this label isn't visible outside of this instantiation of this assembly block... */Yes, it is visible outside. All labels are in one scope per function, including the inline asm labels.Just a shortcut for declaring a new label at the end and branching there, which is a rather common construct (especially when using out-of-line sections). I agree with "C" that break would be a better keyword, though.push ecx; push edx; call handle_failed; pop edx; pop ecx; return; /* This tells the compiler to exit the assembly block. Alternatively, a return label could be declared. */Exit the assembly block? I don't know what you mean by that.It'd be nice if the language didn't force the compiler to do this in all cases, though. For instance, it's not necessary when just reading timestamps, or making use of some fancy computational instruction for which the compiler doesn't have an intrinsic, or as a touch-up in a critical function that the compiler doesn't optimize well enough. At the very least, "noclobber memory" should exist, but a compiler should also be allowed to look for itself. If the compiler doesn't support this, it could always fall back on assuming "clobber memory" for everything./* Tell the compiler that, since this thing acts as a mutex, no memory accesses can be reordered across it. It's probably not necessary in this case, though, as it contains a function call. */ clobber memory;Unnecessary, as the inline assembler assumes memory is clobbered.When using inline asm, you'll always run the risk of nonportability between compilers - after all, things like register conventions, calling conventions, etc., are not defined by the language. Only the syntax of the inline assembler is.But would it not be better to reduce the potential sources of nonportability, by letting the programmer tell the compiler to handle certain details? If the compiler can know the offset from EBP at assembly time, it presumably knows that it's on the stack, and thus that it should index off of EBP.That's a compiler implementation detail. Other compilers might not have that restriction (for example, they may allow the registers to be patched into the assembled code later on, or use an external assembler). If the compiler has to choose registers for the asm block in advance, it could just add the store instruction itself at the time it handles the inline assembly (in which case you get exactly the same code as you do now), or it could remember which register the asm block used and use that in the subsequent non-asm code.Plus, pc is probably going to be used soon after the asm block; why force it onto the stack and then back?Because the inline assembler assembles the code long before any register assignments are done.It's a little harder when it's 30,000 lines out of a few million, and most of that needs to stay portable, so any assembler has to be buried in separate inline functions. In any case, I don't think the language should throw away the opportunity for such optimizations just because they don't help the majority of programs. The compiler is free to not implement them if it doesn't feel they're important. -ScottThe problem is that the programmer can't know exactly what he wants, without knowing some decisions that the compiler will make. GCC's syntax allows the programmer to tell the compiler exactly where to substitute those decisions. Removing the ability of the compiler to make the decisions will lead to slower code.You are correct in the abstract. In my experience, I believe the difference to be negligible. I profile code extensively to make it faster. The bottlenecks turn out to be maybe 30 lines of code out of a few thousand. Those I just write completely in hand-tuned inline assembler.
May 08 2003
"Scott Wood" <scott buserror.net> wrote in message news:slrnbbluah.1cq.scott ti.buserror.net...On Thu, 8 May 2003 10:50:21 -0700, Walter <walter digitalmars.com> wrote:a"Scott Wood" <scott buserror.net> wrote in message news:slrnbbjfj0.1a2.scott ti.buserror.net...Still, I'm a bit uncomfortable with the idea that the compiler's always right and cannot be corrected, even explicitly. I've seen GCC silently decide not to inline a function (on which inlining was requested) because it was "too big", even though it was just a large switch statement on a constant, which ended up being one or two instructions after optimization. Given that no compiler is going to make the right choice all the time, it's nice to be able to declare one's intent when there's a clear reason to do so.I think that comes with the territory of using a high level language. Iftoparticular routine is a major bottleneck in your program (and it does usually come down to one!), and you want to make the effort to tune itI see the inline/not inline as a quality of implementation issue. The language design should specify semantics, and the semantics should not change if something is inlined or not. I want to allow the compiler writer to be as free as possible to innovate how D is implemented. Trying to specify exactly what optimizations are performed in the language spec can forestall that. Note that DMD has a compiler switch to turn inlining on or off.the max, write it in inline assembler.Except that in this case, using inline assembler would have made it worse. The code was expecting to have the switch(constant) optimized away to just the relevant case. Writing the containing functions in assembly would not have been realistic, as the to-be-inlined functions were used all over the source tree (they were used to move data to/from userspace).capabilityeax = 0; /* This tells the compiler to get a zero into eax, in whatever way it chooses. Maybe the caller (which is inlining this function) had one lying around in a register, and it can now choose to use eax for that variable. */The Digital Mars C++ compiler can do this, but after having thatIt's not worth it. I have a lot of practice writing fast applications (DMC is the fastest compiler, and has been for 15 years).for 15 years it just never proved out to be very useful.It's a pretty small gain in this case, but what if it were a non-constant, that is almost guaranteed to be in some register before the asm statement?I misspoke. It doesn't do it in cases where none of the asm instructions could possibly modify memory.It'd be nice if the language didn't force the compiler to do this in all cases, though. For instance, it's not necessary when just reading timestamps, or making use of some fancy computational instruction for which the compiler doesn't have an intrinsic, or as a touch-up in a critical function that the compiler doesn't optimize well enough. At the very least, "noclobber memory" should exist, but a compiler should also be allowed to look for itself. If the compiler doesn't support this, it could always fall back on assuming "clobber memory" for everything./* Tell the compiler that, since this thing acts as a mutex, no memory accesses can be reordered across it. It's probably not necessary in this case, though, as it contains a function call. */ clobber memory;Unnecessary, as the inline assembler assumes memory is clobbered.betweenWhen using inline asm, you'll always run the risk of nonportabilitythecompilers - after all, things like register conventions, calling conventions, etc., are not defined by the language. Only the syntax ofOne thing I do in inline asm sometimes is muck with stack and the frame registers. The variable name gives me an offset as if I hadn't - I then adjust it as necessary.inline assembler is.But would it not be better to reduce the potential sources of nonportability, by letting the programmer tell the compiler to handle certain details? If the compiler can know the offset from EBP at assembly time, it presumably knows that it's on the stack, and thus that it should index off of EBP.They may not have that restriction, yes, but I don't want to force the compiler to be built that way. I want to keep the bar low for building a basic spec compliant D compiler, while making it possible to build very advanced spec compliant ones.That's a compiler implementation detail. Other compilers might not have that restriction (for example, they may allow the registers to be patched into the assembled code later on, or use an external assembler).Plus, pc is probably going to be used soon after the asm block; why force it onto the stack and then back?Because the inline assembler assembles the code long before any register assignments are done.differenceThe problem is that the programmer can't know exactly what he wants, without knowing some decisions that the compiler will make. GCC's syntax allows the programmer to tell the compiler exactly where to substitute those decisions. Removing the ability of the compiler to make the decisions will lead to slower code.You are correct in the abstract. In my experience, I believe theIf the compiler is free not to implement it, then it can't be part of the language spec. D doesn't preclude any vendors from adding extensions, though. Extensions are important as they're how new innovations get tried out. The good ones will wind up getting folded into D. I'm not sure what you mean by portable, as GCC's way of doing inline assembler is not portable to any other compiler. As far as I've been able to figure out (with google), most of it isn't even documented. I figured out how to use it by reading the kernel listings. I'm currently in the process of building a linux version of D. It's pretty sweet to be able to take the inline asm code from win32 and recompile it under linux and it works just the same with no modification. That's a hopeless task if you're using separate asm files, or if you're using the inline assembler from a C compiler. I've even got obj2asm to work on elf files, so now you can disassemble .o files and see it in intel syntax! P.S. How I write a whole function in hand-tuned asm is write it in C, compile it, disassemble it with obj2asm, cut & paste the code back into the C source in an asm block, and then tune.to be negligible. I profile code extensively to make it faster. The bottlenecks turn out to be maybe 30 lines of code out of a few thousand. Those I just write completely in hand-tuned inline assembler.It's a little harder when it's 30,000 lines out of a few million, and most of that needs to stay portable, so any assembler has to be buried in separate inline functions. In any case, I don't think the language should throw away the opportunity for such optimizations just because they don't help the majority of programs. The compiler is free to not implement them if it doesn't feel they're important.
May 09 2003
On Fri, 9 May 2003 01:23:17 -0700, Walter <walter digitalmars.com> wrote:I see the inline/not inline as a quality of implementation issue.For the default case, sure. I'll wait until compilers have a full, working AI built in before I trust even the best compiler to *always* get it right, though.The language design should specify semantics, and the semantics should not change if something is inlined or not. I want to allow the compiler writer to be as free as possible to innovate how D is implemented. Trying to specify exactly what optimizations are performed in the language spec can forestall that.I'm not suggesting that the language mandate certain optimizations; just that there be a standard way of communicating one's intentions to the compiler. If the compiler doesn't support inlining at all, then fine, don't inline; however, if it does support it, it should pay attention to the programmer's request.If there's no cost to it (as is the case with compilers which already implement such things, including GCC), then any optimization is worth it. It doesn't make the language any harder to write a compiler for, as a compiler can choose to always interpret an assignment as a mov statement.It's a pretty small gain in this case, but what if it were a non-constant, that is almost guaranteed to be in some register before the asm statement?It's not worth it.I have a lot of practice writing fast applications (DMC is the fastest compiler, and has been for 15 years).But how much do you need to use assembly in a compiler? Take something like a kernel instead, which often needs to use assembly for various things, including the aforementioned copying of data between user and kernel. This is done a lot, and saving a few cycles on every such occurance *does* show up in the benchmarks, especially since so many of them are just copying one or two words (making the overhead very visible). Loading the value from userspace, then storing it on the stack, then loading it again immediately after the asm block is over will be noticeable. If you're on anything but a non-regparm x86, add the cost of storing the user address to the stack (since it was passed in a register) and then loading it again. The compiler will generally do these sorts of things for its own generated code; it doesn't strike me as a freak occurance for a compiler to allow the user access to the same thing when using inline assembly.If you can specify that the value must be in a register in the beginning and/or end of the block, you don't need to worry about the validity of the address in the middle of the block.But would it not be better to reduce the potential sources of nonportability, by letting the programmer tell the compiler to handle certain details? If the compiler can know the offset from EBP at assembly time, it presumably knows that it's on the stack, and thus that it should index off of EBP.One thing I do in inline asm sometimes is muck with stack and the frame registers. The variable name gives me an offset as if I hadn't - I then adjust it as necessary.If the compiler isn't built that way, just act as if the user put a mov instruction there. If the syntax allows the user to ask the compiler to choose the register, it can pick one arbitrarily if it's not capable of picking a good one.They may not have that restriction, yes, but I don't want to force the compiler to be built that way.That's a compiler implementation detail. Other compilers might not have that restriction (for example, they may allow the registers to be patched into the assembled code later on, or use an external assembler).Plus, pc is probably going to be used soon after the asm block; why force it onto the stack and then back?Because the inline assembler assembles the code long before any register assignments are done.The semantics behind what the programmer requests must be implemented; it's the optimization that the semantics allow that does not need to be there in simpler compilers.It's a little harder when it's 30,000 lines out of a few million, and most of that needs to stay portable, so any assembler has to be buried in separate inline functions. In any case, I don't think the language should throw away the opportunity for such optimizations just because they don't help the majority of programs. The compiler is free to not implement them if it doesn't feel they're important.If the compiler is free not to implement it, then it can't be part of the language spec.D doesn't preclude any vendors from adding extensions, though. Extensions are important as they're how new innovations get tried out. The good ones will wind up getting folded into D.Sure. However, this often leads to different compilers implementing the same feature in incompatible ways, requiring programs that want to use the feature to use lots of conditional compilation to remain semi-portable. If the new feature would require significant effort to implement correctly (not necessarily efficiently), then I agree that it should stay out of the language unless it is demonstrated to be sufficiently useful (though it might sometimes be beneficial to formalize it into an optional yet standardized extension, so that if it is implemented, it's implemented in the same way). However, some of these things could be implemented (poorly, but correctly and no worse than if the feature weren't used) with a sed script if one were so inclined.I'm not sure what you mean by portable, as GCC's way of doing inline assembler is not portable to any other compiler.Intel's compiler claims to support GCC inline assembly on x86 (their IA64 compiler apparently doesn't support inline assembly at all). However, in general, the lack of portability of inline assembly between compilers for the same architecture is a bit annoying. I was hoping that, with D's placing it into the language itself, it would cease to be an issue. However, once extensions to the basic syntax are relied on, you're right back to the current state of incompatibility.As far as I've been able to figure out (with google), most of it isn't even documented. I figured out how to use it by reading the kernel listings.It's documented in the GCC info pages. Look for the "Extended Asm" node, as well as the section on constraints.I'm currently in the process of building a linux version of D. It's pretty sweet to be able to take the inline asm code from win32 and recompile it under linux and it works just the same with no modification. That's a hopeless task if you're using separate asm files,Not really. There are Intel-syntax assemblers for Linux (even gas can be told to use it now), and gas is available for Windows should one want to go the other way.or if you're using the inline assembler from a C compiler.Unless you're using the same C compiler on both platforms.I've even got obj2asm to work on elf files, so now you can disassemble .o files and see it in intel syntax!GNU objdump can do that as well, by passing "-m i386:intel".P.S. How I write a whole function in hand-tuned asm is write it in C, compile it, disassemble it with obj2asm, cut & paste the code back into the C source in an asm block, and then tune.And do it over again every time the C code changes, or when a header it depends on changes (if you notice!). Each time, doing it for every supported architecture. It's still a useful technique for certain situations, but it's not a replacement for flexible inline assembly. -Scott
May 09 2003
Walter wrote:I'm currently porting D to linux. Believe me, the inline assembler is a great boon to that. Just try converting MASM files to gas files! To me, using gas is like trying to write code looking in a mirror.Why are you using GAS? You can use NASM (or maybe FASM) instead! Both use a (cleaned-up?) Intel-Syntax. There have also been a number of converters NASM <-> GAS <-> MASM. And besides, the new GAS has been told to be able to use Intel-Syntax. BTW, i didn't find a reliable way to use NASM with DigitalMars compilers for Windows. It has Borland format, but it somehow didn't work. I'll try to reproduce this problem someday later. -i.
May 10 2003
I did find reliable way to use NASM with Digital Mars for Win32 and DOSX targets. The problem is that common statement section .data or section .code in COFF and other formats is expanded to something line 'dword aligned 32-bit segment of code(or text)' When the same statement is used for OBJ format, it is not treated as pervious. To make them identical, you should write section .code align=4 use32 As for DOSX target, the previous is not sufficient. You should write section _DATA class=DATA align=4 use32 or section _CODE class=CODE align=4 use32 And moreover, you should place somewhere directive group DGROUP _DATA to tell linker to group data segment in this module with others. The last described technique (I mean for DOSX target) is fully compatible with Win32 target code. I used this in order to compile XVID codec sources both for Win32 and DOSX with DMC and it works. BTW, with optimizations turned on C version of codec (when asm is not used) runs almost twice faster than not optimized one. I think DMC optimizer is cool! Nic Tiger. "Ilya Minkov" <midiclub 8ung.at> wrote in message news:b9j6pl$32m$1 digitaldaemon.com...Walter wrote:I'm currently porting D to linux. Believe me, the inline assembler is a great boon to that. Just try converting MASM files to gas files! To me, using gas is like trying to write code looking in a mirror.Why are you using GAS? You can use NASM (or maybe FASM) instead! Both use a (cleaned-up?) Intel-Syntax. There have also been a number of converters NASM <-> GAS <-> MASM. And besides, the new GAS has been told to be able to use Intel-Syntax. BTW, i didn't find a reliable way to use NASM with DigitalMars compilers for Windows. It has Borland format, but it somehow didn't work. I'll try to reproduce this problem someday later. -i.
May 10 2003