digitalmars.D - Predefined Version expansion
- Dan (16/16) Apr 02 2007 Following the discussion on D's hashtables, and my current understanding...
- Walter Bright (2/21) Apr 02 2007 The inline assembler does support all the SSE3 instructions, etc.
- Dan (4/7) Apr 02 2007 Yes, the GDC and DMD implementations support all the SSE3 instructions, ...
- torhu (5/11) Apr 02 2007 Wouldn't you need runtime checking for cpu capabilities most of the time...
- Jarrett Billingsley (5/14) Apr 02 2007 Just because their machine doesn't support it, why wouldn't you allow
- Walter Bright (2/11) Apr 02 2007 I think what you need is a runtime check, which is provided in std.cpuid...
- Dan (5/6) Apr 03 2007 So what you're saying is, we can't optimize the compiler for a specific ...
- Derek Parnell (20/26) Apr 03 2007 No, maybe you missed the point. It certainly is possible for one to crea...
- Dan (4/31) Apr 03 2007 Yeah, I guess defining the versions yourself is totally possible, and re...
- Don Clugston (18/22) Apr 03 2007 Dan,
- Walter Bright (2/4) Apr 03 2007 If it is, then one should put the switch at an enclosing level.
- Daniel Keep (14/19) Apr 03 2007 Out of interest, which is faster: a branch at the start of a function
- Don Clugston (8/20) Apr 03 2007 I suspect the bool comparison would be *much* quicker, since the branch
- Pragma (7/28) Apr 04 2007 I'm starting to see what you mean. If DDL kept generalized fixup data a...
- Frits van Bommel (17/20) Apr 04 2007 Wouldn't ld's "--emit-relocs" (aka "-q") remove that limitation? From
- Dave (8/29) Apr 04 2007 I've seen this type of thing used in Intel compiler generated code to go...
- Dan (3/8) Apr 04 2007 There is already a cpu variable/object/module or something within the ph...
- Don Clugston (20/30) Apr 05 2007 Yes. All that's required is for the spec to include standard names for
- janderson (16/58) Apr 05 2007 I've done this manually with a set of DLL's before. Basically the exe
- janderson (5/27) Apr 02 2007 What would be nice is if you could tell the compiler to spit out DLLs
Following the discussion on D's hashtables, and my current understanding that D doesn't implement any of the x86 CPU features beyond Pentium Pro (same as C and most other things for that matter) I took another quick peek at D's predefined Versions. Currently, when I want to use asm {} I have to use: version(D_InlineAsm_X86) Unfortunately, this doesn't cover anything to do with most of the later instruction extensions D claims to support in the spec, such as the SSE, SSE2, SSE3, SSSE3, MMX, 3DNow! instruction sets, let alone x86_64. Ultimately, one would want to have a cascading support mechanism, such that if I want to use SSSE3, I'm obviating that the processor must support everything before it: version(D_InlineAsm_x86) version(D_InlineAsm_x86_3DNow) version(D_InlineAsm_x86_MMX) version(D_InlineAsm_x86_SSE) version(D_InlineAsm_x86_SSE2) version(D_InlineAsm_x86_SSE3) version(D_InlineAsm_x86_SSSE3) version(D_InlineAsm_x86_SSE4) version(D_InlineAsm_x86_64) Predefining those so that compiler writers can raise errors if they don't yet support SSE3 instructions, but they do support x86 and MMX, would be a smart thing to do. It would also allow implementors to write code that WON'T compile and try to run if the user's CPU doesn't support it - preventing what could be a devastating crash. Sincerely, Dan
Apr 02 2007
Dan wrote:Following the discussion on D's hashtables, and my current understanding that D doesn't implement any of the x86 CPU features beyond Pentium Pro (same as C and most other things for that matter) I took another quick peek at D's predefined Versions. Currently, when I want to use asm {} I have to use: version(D_InlineAsm_X86) Unfortunately, this doesn't cover anything to do with most of the later instruction extensions D claims to support in the spec, such as the SSE, SSE2, SSE3, SSSE3, MMX, 3DNow! instruction sets, let alone x86_64. Ultimately, one would want to have a cascading support mechanism, such that if I want to use SSSE3, I'm obviating that the processor must support everything before it: version(D_InlineAsm_x86) version(D_InlineAsm_x86_3DNow) version(D_InlineAsm_x86_MMX) version(D_InlineAsm_x86_SSE) version(D_InlineAsm_x86_SSE2) version(D_InlineAsm_x86_SSE3) version(D_InlineAsm_x86_SSSE3) version(D_InlineAsm_x86_SSE4) version(D_InlineAsm_x86_64) Predefining those so that compiler writers can raise errors if they don't yet support SSE3 instructions, but they do support x86 and MMX, would be a smart thing to do. It would also allow implementors to write code that WON'T compile and try to run if the user's CPU doesn't support it - preventing what could be a devastating crash.The inline assembler does support all the SSE3 instructions, etc.
Apr 02 2007
Walter Bright Wrote:Yes, the GDC and DMD implementations support all the SSE3 instructions, but someone with a i486 Pentium Pro does not, and neither might another D implementation implementing the D language spec. When I use an SSE3 instruction in D, is it going to raise an error if the person's system is only a Pentium Pro? If that could be done by checking the CPU support for each instruction implicitly, then it would be even better than versioning it all off... but I don't think that's the case - and it's not specified?Predefining those so that compiler writers can raise errors if they don't yet support SSE3 instructions, but they do support x86 and MMX, would be a smart thing to do. It would also allow implementors to write code that WON'T compile and try to run if the user's CPU doesn't support it - preventing what could be a devastating crash.The inline assembler does support all the SSE3 instructions, etc.
Apr 02 2007
Dan wrote: <snip>Yes, the GDC and DMD implementations support all the SSE3 instructions, but someone with a i486 Pentium Pro does not, and neither might another D implementation implementing the D language spec. When I use an SSE3 instruction in D, is it going to raise an error if the person's system is only a Pentium Pro? If that could be done by checking the CPU support for each instruction implicitly, then it would be even better than versioning it all off... but I don't think that's the case - and it's not specified?Wouldn't you need runtime checking for cpu capabilities most of the time anyway? The functions in std.cpuid should get you a long way in most cases. Only works for x86, though.
Apr 02 2007
"Dan" <murpsoft hotmail.com> wrote in message news:eurruh$26gc$1 digitalmars.com...Walter Bright Wrote: Yes, the GDC and DMD implementations support all the SSE3 instructions, but someone with a i486 Pentium Pro does not, and neither might another D implementation implementing the D language spec. When I use an SSE3 instruction in D, is it going to raise an error if the person's system is only a Pentium Pro? If that could be done by checking the CPU support for each instruction implicitly, then it would be even better than versioning it all off... but I don't think that's the case - and it's not specified?Just because their machine doesn't support it, why wouldn't you allow someone to use SSE3 instructions? For that matter, who says you have to be running the compiler on an x86 platform to compile to x86 machine code?
Apr 02 2007
Dan wrote:Walter Bright Wrote:I think what you need is a runtime check, which is provided in std.cpuid.Yes, the GDC and DMD implementations support all the SSE3 instructions, but someone with a i486 Pentium Pro does not, and neither might another D implementation implementing the D language spec. When I use an SSE3 instruction in D, is it going to raise an error if the person's system is only a Pentium Pro? If that could be done by checking the CPU support for each instruction implicitly, then it would be even better than versioning it all off... but I don't think that's the case - and it's not specified?Predefining those so that compiler writers can raise errors if they don't yet support SSE3 instructions, but they do support x86 and MMX, would be a smart thing to do. It would also allow implementors to write code that WON'T compile and try to run if the user's CPU doesn't support it - preventing what could be a devastating crash.The inline assembler does support all the SSE3 instructions, etc.
Apr 02 2007
Walter Bright Wrote:I think what you need is a runtime check, which is provided in std.cpuid.So what you're saying is, we can't optimize the compiler for a specific variation of the x86, and are therefore stuck with writing generic programs that branch off for each cpu kind during runtime? Jarrett: It would be completely pointless to virtualize SSE3, or any other x86 upgrade. They provide instructions to assist with optimization. Virtualizing it is a huge de-optimization. If you somehow version'd off some asm, you could write asm for x86_64, one for x86 with SSE2, and one for the rest, and you'd have reasonable version control with fallbacks. If the compiler was targetting an x86 without any extensions, it would automatically choose the right code. Is it bad to allow that?
Apr 03 2007
On Tue, 03 Apr 2007 10:48:01 -0400, Dan wrote:Walter Bright Wrote:No, maybe you missed the point. It certainly is possible for one to create editions of an application for specific hardware configurations; and using the version() statement is a reasonable way to do that. version(SSE2) { . . . } version(SSE) { . . . } etc... dmd -version=SSE2 myapp.d dmd -version=SSE myapp.d However, such editions should be able to be generated regardless of which hardware architecure the compiler just happens to be running on at the time. In other words, setting the version values within the compiler based on the hardware at compilation time is not very useful. It would be better to set these version values at the compiler command line level, if one does really want hardware-specific editions of the app. -- Derek Parnell Melbourne, Australia "Justice for David Hicks!" skype: derek.j.parnellI think what you need is a runtime check, which is provided in std.cpuid.So what you're saying is, we can't optimize the compiler for a specific variation of the x86, and are therefore stuck with writing generic programs that branch off for each cpu kind during runtime?
Apr 03 2007
Derek Parnell Wrote:On Tue, 03 Apr 2007 10:48:01 -0400, Dan wrote:Yeah, I guess defining the versions yourself is totally possible, and reasonable. : p Less persuasively, and more as a closing note, I tend to think that the more D is used for other platforms (GDC retargetable?) the more those predefined platform ones will be wanted. It's generally bad to have each programmer use different names for exactly the same thing.Walter Bright Wrote:No, maybe you missed the point. It certainly is possible for one to create editions of an application for specific hardware configurations; and using the version() statement is a reasonable way to do that. version(SSE2) { . . . } version(SSE) { . . . } etc... dmd -version=SSE2 myapp.d dmd -version=SSE myapp.d However, such editions should be able to be generated regardless of which hardware architecure the compiler just happens to be running on at the time. In other words, setting the version values within the compiler based on the hardware at compilation time is not very useful. It would be better to set these version values at the compiler command line level, if one does really want hardware-specific editions of the app.I think what you need is a runtime check, which is provided in std.cpuid.So what you're saying is, we can't optimize the compiler for a specific variation of the x86, and are therefore stuck with writing generic programs that branch off for each cpu kind during runtime?
Apr 03 2007
Dan wrote:Walter Bright Wrote:Dan, I think that there are two seperate issues. The D_InlineAsm_X86 seems to mean "this compiler supports inline asm" rather than "this compiler is targetting X86". I think it's reasonable to require that any compiler which supports X86 inline asm should support *all* X86 opcodes. However... I think we do need to work out a way of specifying the CPU target -- but that should be accessible even if inline asm is not supported. (For example, you may want to call a library function which uses SSE3, even if you're not using it yourself). Yes, it's possible to detect the CPU type at runtime, but the performance penalty is appalling for very short functions. I think the ideal would be a install-time linker -- link in the appropriate code during installation! The CPU type will always be the same, every time it's run. But version names would be a great first step. Standard names for the CPU targets needs to happen. Otherwise everyone will make up their own.I think what you need is a runtime check, which is provided in std.cpuid.So what you're saying is, we can't optimize the compiler for a specific variation of the x86, and are therefore stuck with writing generic programs that branch off for each cpu kind during runtime?
Apr 03 2007
Don Clugston wrote:Yes, it's possible to detect the CPU type at runtime, but the performance penalty is appalling for very short functions.If it is, then one should put the switch at an enclosing level.
Apr 03 2007
Walter Bright wrote:Don Clugston wrote:Out of interest, which is faster: a branch at the start of a function (say, just a comparison with a bool), or using function pointers that are set up to point to the correct implementation at start-up? -- Daniel -- int getRandomNumber() { return 4; // chosen by fair dice roll. // guaranteed to be random. } http://xkcd.com/ v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/Yes, it's possible to detect the CPU type at runtime, but the performance penalty is appalling for very short functions.If it is, then one should put the switch at an enclosing level.
Apr 03 2007
Daniel Keep wrote:Walter Bright wrote:I suspect the bool comparison would be *much* quicker, since the branch is trivially predictable, and will only cost a single clock cycle. AFAIK, it's only in the past two years that any CPUs have had branch prediction for indirect branches. OTOH, the version involving branches would probably be less code-cache efficient. The fastest option would be to patch the CALL instructions directly, just as a linker does. DDL will probably be able to do it eventually.Don Clugston wrote:Out of interest, which is faster: a branch at the start of a function (say, just a comparison with a bool), or using function pointers that are set up to point to the correct implementation at start-up? -- DanielYes, it's possible to detect the CPU type at runtime, but the performance penalty is appalling for very short functions.If it is, then one should put the switch at an enclosing level.
Apr 03 2007
Don Clugston wrote:Daniel Keep wrote:I'm starting to see what you mean. If DDL kept generalized fixup data around during runtime, it would be trivial to swap one function address for another. The only catch is that this technique would only be available to dynamic modules - the pre-linked .exe code can't be modified this way. I'll keep this in mind. -- - EricAnderton at yahooWalter Bright wrote:I suspect the bool comparison would be *much* quicker, since the branch is trivially predictable, and will only cost a single clock cycle. AFAIK, it's only in the past two years that any CPUs have had branch prediction for indirect branches. OTOH, the version involving branches would probably be less code-cache efficient. The fastest option would be to patch the CALL instructions directly, just as a linker does. DDL will probably be able to do it eventually.Don Clugston wrote:Out of interest, which is faster: a branch at the start of a function (say, just a comparison with a bool), or using function pointers that are set up to point to the correct implementation at start-up? -- DanielYes, it's possible to detect the CPU type at runtime, but the performance penalty is appalling for very short functions.If it is, then one should put the switch at an enclosing level.
Apr 04 2007
Pragma wrote:for another. The only catch is that this technique would only be available to dynamic modules - the pre-linked .exe code can't be modified this way.Wouldn't ld's "--emit-relocs" (aka "-q") remove that limitation? From the ld man page: ===== -q --emit-relocs Leave relocation sections and contents in fully linked exececuta‐ bles. Post link analysis and optimization tools may need this information in order to perform correct modifications of executa‐ bles. This results in larger executables. This option is currently only supported on ELF platforms. ===== I have no idea if optlink has a similar option. Is it even possible to store such information in PE files? (i.e. do they support arbitrary sections or a special section for this stuff?) I seem to remember you can append arbitrary data to PE files without breaking them, but obviously that's not ideal...
Apr 04 2007
Don Clugston wrote:Daniel Keep wrote:I've seen this type of thing used in Intel compiler generated code to good effect, and w/o really any adverse performance that I'm aware of. Just a thought for D -- add a global CPU type to the D standard runtime that is set during initialization (i.e.: in Dmain) that can be used by user code like BLADE and by the compiler itself for (future) processor determined optimization branches. Actually, I'm kind-of surprised one isn't there already (or is it?). - DaveWalter Bright wrote:I suspect the bool comparison would be *much* quicker, since the branch is trivially predictable, and will only cost a single clock cycle. AFAIK, it's only in the past two years that any CPUs have had branch prediction for indirect branches. OTOH, the version involving branches would probably be less code-cache efficient. The fastest option would be to patch the CALL instructions directly, just as a linker does. DDL will probably be able to do it eventually.Don Clugston wrote:Out of interest, which is faster: a branch at the start of a function (say, just a comparison with a bool), or using function pointers that are set up to point to the correct implementation at start-up? -- DanielYes, it's possible to detect the CPU type at runtime, but the performance penalty is appalling for very short functions.If it is, then one should put the switch at an enclosing level.
Apr 04 2007
Dave Wrote:Just a thought for D -- add a global CPU type to the D standard runtime that is set during initialization (i.e.: in Dmain) that can be used by user code like BLADE and by the compiler itself for (future) processor determined optimization branches. Actually, I'm kind-of surprised one isn't there already (or is it?).There is already a cpu variable/object/module or something within the phobos library which provides cpu related information. The problem being discussed is that this information is stored and branches are made based on the cpu type during *runtime*. If instead, it was pre-determined during compile time, these branches could be optimized out. Since version() is designed to perform precisely this function, it makes sense to continue to use version() to identify and target cpu's during compile time.
Apr 04 2007
Dan wrote:Dave Wrote:Yes. All that's required is for the spec to include standard names for the CPU types. I don't think any DMD compiler changes are required. How about these: X86_MMX // necessary? (MMX is dead technology). X86_SSE X86_SSE2 X86_SSE3 X86_SSSE3 // is this really necessary? X86_SSE4 Only change would be that the GDC compiler for X86_64 should set all of the above. Right now, the predefined version identifiers cannot be set from the command line, so eventually the compiler would need a CPU switch to control them - it would specify that the compiler has freedom to use recent opcodes, but of course it could continue to generate exactly the same code. I think such a switch would be necessary for supporting the 3- and 4-element array types with swizzle functions discussed recently. However, if we all agreed to use the same set of version identifiers, we can get going immediately.Just a thought for D -- add a global CPU type to the D standard runtime that is set during initialization (i.e.: in Dmain) that can be used by user code like BLADE and by the compiler itself for (future) processor determined optimization branches. Actually, I'm kind-of surprised one isn't there already (or is it?).There is already a cpu variable/object/module or something within the phobos library which provides cpu related information. The problem being discussed is that this information is stored and branches are made based on the cpu type during *runtime*. If instead, it was pre-determined during compile time, these branches could be optimized out. Since version() is designed to perform precisely this function, it makes sense to continue to use version() to identify and target cpu's during compile time.
Apr 05 2007
Don Clugston wrote:Dan wrote:I've done this manually with a set of DLL's before. Basically the exe would be a tiny file that would call the correct DLL at startup. You would only compile all the program DLLs when distributing the program. This way you only branch when you pick the correct CPU at application start. Now I should mention Michael Abrash (of which I'm a big Fan) used a technique to simulate DirectX7 effeciently in software where he created CPU code on the fly for different CPU's. It didn't suffer from cache misses (which is typical in self modification programs) because the generated data was consumed a frame later. If the compiler was able to make simple changes to the code at startup like a virtual machine or Interpreter that would be cool. You could in essence consider it as a compressed version of the DLL stradigie I employed, except you'd be able to make use of combinations more effectively and optimize for AMD, Intel chips ect... -JoelDave Wrote:Yes. All that's required is for the spec to include standard names for the CPU types. I don't think any DMD compiler changes are required. How about these: X86_MMX // necessary? (MMX is dead technology). X86_SSE X86_SSE2 X86_SSE3 X86_SSSE3 // is this really necessary? X86_SSE4 Only change would be that the GDC compiler for X86_64 should set all of the above. Right now, the predefined version identifiers cannot be set from the command line, so eventually the compiler would need a CPU switch to control them - it would specify that the compiler has freedom to use recent opcodes, but of course it could continue to generate exactly the same code. I think such a switch would be necessary for supporting the 3- and 4-element array types with swizzle functions discussed recently. However, if we all agreed to use the same set of version identifiers, we can get going immediately.Just a thought for D -- add a global CPU type to the D standard runtime that is set during initialization (i.e.: in Dmain) that can be used by user code like BLADE and by the compiler itself for (future) processor determined optimization branches. Actually, I'm kind-of surprised one isn't there already (or is it?).There is already a cpu variable/object/module or something within the phobos library which provides cpu related information. The problem being discussed is that this information is stored and branches are made based on the cpu type during *runtime*. If instead, it was pre-determined during compile time, these branches could be optimized out. Since version() is designed to perform precisely this function, it makes sense to continue to use version() to identify and target cpu's during compile time.
Apr 05 2007
Dan wrote:Following the discussion on D's hashtables, and my current understanding that D doesn't implement any of the x86 CPU features beyond Pentium Pro (same as C and most other things for that matter) I took another quick peek at D's predefined Versions. Currently, when I want to use asm {} I have to use: version(D_InlineAsm_X86) Unfortunately, this doesn't cover anything to do with most of the later instruction extensions D claims to support in the spec, such as the SSE, SSE2, SSE3, SSSE3, MMX, 3DNow! instruction sets, let alone x86_64. Ultimately, one would want to have a cascading support mechanism, such that if I want to use SSSE3, I'm obviating that the processor must support everything before it: version(D_InlineAsm_x86) version(D_InlineAsm_x86_3DNow) version(D_InlineAsm_x86_MMX) version(D_InlineAsm_x86_SSE) version(D_InlineAsm_x86_SSE2) version(D_InlineAsm_x86_SSE3) version(D_InlineAsm_x86_SSSE3) version(D_InlineAsm_x86_SSE4) version(D_InlineAsm_x86_64) Predefining those so that compiler writers can raise errors if they don't yet support SSE3 instructions, but they do support x86 and MMX, would be a smart thing to do. It would also allow implementors to write code that WON'T compile and try to run if the user's CPU doesn't support it - preventing what could be a devastating crash. Sincerely, DanWhat would be nice is if you could tell the compiler to spit out DLLs for each version (for your release build) and also a tiny exe that would run the right one. -Joel
Apr 02 2007