www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Predefined Version expansion

reply Dan <murpsoft hotmail.com> writes:
Following the discussion on D's hashtables, and my current understanding that D
doesn't implement any of the x86 CPU features beyond Pentium Pro (same as C and
most other things for that matter) I took another quick peek at D's predefined
Versions.

Currently, when I want to use asm {} I have to use:

version(D_InlineAsm_X86)

Unfortunately, this doesn't cover anything to do with most of the later
instruction extensions D claims to support in the spec, such as the SSE, SSE2,
SSE3, SSSE3, MMX, 3DNow! instruction sets, let alone x86_64.  Ultimately, one
would want to have a cascading support mechanism, such that if I want to use
SSSE3, I'm obviating that the processor must support everything before it:

version(D_InlineAsm_x86)
version(D_InlineAsm_x86_3DNow)
version(D_InlineAsm_x86_MMX)
version(D_InlineAsm_x86_SSE)
version(D_InlineAsm_x86_SSE2)
version(D_InlineAsm_x86_SSE3)
version(D_InlineAsm_x86_SSSE3)
version(D_InlineAsm_x86_SSE4)
version(D_InlineAsm_x86_64)

Predefining those so that compiler writers can raise errors if they don't yet
support SSE3 instructions, but they do support x86 and MMX, would be a smart
thing to do.  It would also allow implementors to write code that WON'T compile
and try to run if the user's CPU doesn't support it - preventing what could be
a devastating crash.

Sincerely,
Dan
Apr 02 2007
next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Dan wrote:
 Following the discussion on D's hashtables, and my current understanding that
D doesn't implement any of the x86 CPU features beyond Pentium Pro (same as C
and most other things for that matter) I took another quick peek at D's
predefined Versions.
 
 Currently, when I want to use asm {} I have to use:
 
 version(D_InlineAsm_X86)
 
 Unfortunately, this doesn't cover anything to do with most of the later
instruction extensions D claims to support in the spec, such as the SSE, SSE2,
SSE3, SSSE3, MMX, 3DNow! instruction sets, let alone x86_64.  Ultimately, one
would want to have a cascading support mechanism, such that if I want to use
SSSE3, I'm obviating that the processor must support everything before it:
 
 version(D_InlineAsm_x86)
 version(D_InlineAsm_x86_3DNow)
 version(D_InlineAsm_x86_MMX)
 version(D_InlineAsm_x86_SSE)
 version(D_InlineAsm_x86_SSE2)
 version(D_InlineAsm_x86_SSE3)
 version(D_InlineAsm_x86_SSSE3)
 version(D_InlineAsm_x86_SSE4)
 version(D_InlineAsm_x86_64)
 
 Predefining those so that compiler writers can raise errors if they don't yet
support SSE3 instructions, but they do support x86 and MMX, would be a smart
thing to do.  It would also allow implementors to write code that WON'T compile
and try to run if the user's CPU doesn't support it - preventing what could be
a devastating crash.
The inline assembler does support all the SSE3 instructions, etc.
Apr 02 2007
parent reply Dan <murpsoft hotmail.com> writes:
Walter Bright Wrote:
 Predefining those so that compiler writers can raise errors if they don't yet
support SSE3 instructions, but they do support x86 and MMX, would be a smart
thing to do.  It would also allow implementors to write code that WON'T compile
and try to run if the user's CPU doesn't support it - preventing what could be
a devastating crash.
The inline assembler does support all the SSE3 instructions, etc.
Yes, the GDC and DMD implementations support all the SSE3 instructions, but someone with a i486 Pentium Pro does not, and neither might another D implementation implementing the D language spec. When I use an SSE3 instruction in D, is it going to raise an error if the person's system is only a Pentium Pro? If that could be done by checking the CPU support for each instruction implicitly, then it would be even better than versioning it all off... but I don't think that's the case - and it's not specified?
Apr 02 2007
next sibling parent torhu <fake address.dude> writes:
Dan wrote:
<snip>
 Yes, the GDC and DMD implementations support all the SSE3 instructions, but
someone with a i486 Pentium Pro does not, and neither might another D
implementation implementing the D language spec.
 
 When I use an SSE3 instruction in D, is it going to raise an error if the
person's system is only a Pentium Pro?
 
 If that could be done by checking the CPU support for each instruction
implicitly, then it would be even better than versioning it all off...  but I
don't think that's the case - and it's not specified?
 
Wouldn't you need runtime checking for cpu capabilities most of the time anyway? The functions in std.cpuid should get you a long way in most cases. Only works for x86, though.
Apr 02 2007
prev sibling next sibling parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Dan" <murpsoft hotmail.com> wrote in message 
news:eurruh$26gc$1 digitalmars.com...
 Walter Bright Wrote:

 Yes, the GDC and DMD implementations support all the SSE3 instructions, 
 but someone with a i486 Pentium Pro does not, and neither might another D 
 implementation implementing the D language spec.

 When I use an SSE3 instruction in D, is it going to raise an error if the 
 person's system is only a Pentium Pro?

 If that could be done by checking the CPU support for each instruction 
 implicitly, then it would be even better than versioning it all off... 
 but I don't think that's the case - and it's not specified?
Just because their machine doesn't support it, why wouldn't you allow someone to use SSE3 instructions? For that matter, who says you have to be running the compiler on an x86 platform to compile to x86 machine code?
Apr 02 2007
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Dan wrote:
 Walter Bright Wrote:
 Predefining those so that compiler writers can raise errors if they don't yet
support SSE3 instructions, but they do support x86 and MMX, would be a smart
thing to do.  It would also allow implementors to write code that WON'T compile
and try to run if the user's CPU doesn't support it - preventing what could be
a devastating crash.
The inline assembler does support all the SSE3 instructions, etc.
Yes, the GDC and DMD implementations support all the SSE3 instructions, but someone with a i486 Pentium Pro does not, and neither might another D implementation implementing the D language spec. When I use an SSE3 instruction in D, is it going to raise an error if the person's system is only a Pentium Pro? If that could be done by checking the CPU support for each instruction implicitly, then it would be even better than versioning it all off... but I don't think that's the case - and it's not specified?
I think what you need is a runtime check, which is provided in std.cpuid.
Apr 02 2007
parent reply Dan <murpsoft hotmail.com> writes:
Walter Bright Wrote:
 I think what you need is a runtime check, which is provided in std.cpuid.
So what you're saying is, we can't optimize the compiler for a specific variation of the x86, and are therefore stuck with writing generic programs that branch off for each cpu kind during runtime? Jarrett: It would be completely pointless to virtualize SSE3, or any other x86 upgrade. They provide instructions to assist with optimization. Virtualizing it is a huge de-optimization. If you somehow version'd off some asm, you could write asm for x86_64, one for x86 with SSE2, and one for the rest, and you'd have reasonable version control with fallbacks. If the compiler was targetting an x86 without any extensions, it would automatically choose the right code. Is it bad to allow that?
Apr 03 2007
next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Tue, 03 Apr 2007 10:48:01 -0400, Dan wrote:

 Walter Bright Wrote:
 I think what you need is a runtime check, which is provided in std.cpuid.
So what you're saying is, we can't optimize the compiler for a specific variation of the x86, and are therefore stuck with writing generic programs that branch off for each cpu kind during runtime?
No, maybe you missed the point. It certainly is possible for one to create editions of an application for specific hardware configurations; and using the version() statement is a reasonable way to do that. version(SSE2) { . . . } version(SSE) { . . . } etc... dmd -version=SSE2 myapp.d dmd -version=SSE myapp.d However, such editions should be able to be generated regardless of which hardware architecure the compiler just happens to be running on at the time. In other words, setting the version values within the compiler based on the hardware at compilation time is not very useful. It would be better to set these version values at the compiler command line level, if one does really want hardware-specific editions of the app. -- Derek Parnell Melbourne, Australia "Justice for David Hicks!" skype: derek.j.parnell
Apr 03 2007
parent Dan <murpsoft hotmail.com> writes:
Derek Parnell Wrote:

 On Tue, 03 Apr 2007 10:48:01 -0400, Dan wrote:
 
 Walter Bright Wrote:
 I think what you need is a runtime check, which is provided in std.cpuid.
So what you're saying is, we can't optimize the compiler for a specific variation of the x86, and are therefore stuck with writing generic programs that branch off for each cpu kind during runtime?
No, maybe you missed the point. It certainly is possible for one to create editions of an application for specific hardware configurations; and using the version() statement is a reasonable way to do that. version(SSE2) { . . . } version(SSE) { . . . } etc... dmd -version=SSE2 myapp.d dmd -version=SSE myapp.d However, such editions should be able to be generated regardless of which hardware architecure the compiler just happens to be running on at the time. In other words, setting the version values within the compiler based on the hardware at compilation time is not very useful. It would be better to set these version values at the compiler command line level, if one does really want hardware-specific editions of the app.
Yeah, I guess defining the versions yourself is totally possible, and reasonable. : p Less persuasively, and more as a closing note, I tend to think that the more D is used for other platforms (GDC retargetable?) the more those predefined platform ones will be wanted. It's generally bad to have each programmer use different names for exactly the same thing.
Apr 03 2007
prev sibling parent reply Don Clugston <dac nospam.com.au> writes:
Dan wrote:
 Walter Bright Wrote:
 I think what you need is a runtime check, which is provided in std.cpuid.
So what you're saying is, we can't optimize the compiler for a specific variation of the x86, and are therefore stuck with writing generic programs that branch off for each cpu kind during runtime?
Dan, I think that there are two seperate issues. The D_InlineAsm_X86 seems to mean "this compiler supports inline asm" rather than "this compiler is targetting X86". I think it's reasonable to require that any compiler which supports X86 inline asm should support *all* X86 opcodes. However... I think we do need to work out a way of specifying the CPU target -- but that should be accessible even if inline asm is not supported. (For example, you may want to call a library function which uses SSE3, even if you're not using it yourself). Yes, it's possible to detect the CPU type at runtime, but the performance penalty is appalling for very short functions. I think the ideal would be a install-time linker -- link in the appropriate code during installation! The CPU type will always be the same, every time it's run. But version names would be a great first step. Standard names for the CPU targets needs to happen. Otherwise everyone will make up their own.
Apr 03 2007
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Don Clugston wrote:
 Yes, it's possible to detect the CPU type at runtime, but the 
 performance penalty is appalling for very short functions.
If it is, then one should put the switch at an enclosing level.
Apr 03 2007
parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Walter Bright wrote:
 Don Clugston wrote:
 Yes, it's possible to detect the CPU type at runtime, but the
 performance penalty is appalling for very short functions.
If it is, then one should put the switch at an enclosing level.
Out of interest, which is faster: a branch at the start of a function (say, just a comparison with a bool), or using function pointers that are set up to point to the correct implementation at start-up? -- Daniel -- int getRandomNumber() { return 4; // chosen by fair dice roll. // guaranteed to be random. } http://xkcd.com/ v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP http://hackerkey.com/
Apr 03 2007
parent reply Don Clugston <dac nospam.com.au> writes:
Daniel Keep wrote:
 
 Walter Bright wrote:
 Don Clugston wrote:
 Yes, it's possible to detect the CPU type at runtime, but the
 performance penalty is appalling for very short functions.
If it is, then one should put the switch at an enclosing level.
Out of interest, which is faster: a branch at the start of a function (say, just a comparison with a bool), or using function pointers that are set up to point to the correct implementation at start-up? -- Daniel
I suspect the bool comparison would be *much* quicker, since the branch is trivially predictable, and will only cost a single clock cycle. AFAIK, it's only in the past two years that any CPUs have had branch prediction for indirect branches. OTOH, the version involving branches would probably be less code-cache efficient. The fastest option would be to patch the CALL instructions directly, just as a linker does. DDL will probably be able to do it eventually.
Apr 03 2007
next sibling parent reply Pragma <ericanderton yahoo.removeme.com> writes:
Don Clugston wrote:
 Daniel Keep wrote:
 Walter Bright wrote:
 Don Clugston wrote:
 Yes, it's possible to detect the CPU type at runtime, but the
 performance penalty is appalling for very short functions.
If it is, then one should put the switch at an enclosing level.
Out of interest, which is faster: a branch at the start of a function (say, just a comparison with a bool), or using function pointers that are set up to point to the correct implementation at start-up? -- Daniel
I suspect the bool comparison would be *much* quicker, since the branch is trivially predictable, and will only cost a single clock cycle. AFAIK, it's only in the past two years that any CPUs have had branch prediction for indirect branches. OTOH, the version involving branches would probably be less code-cache efficient. The fastest option would be to patch the CALL instructions directly, just as a linker does. DDL will probably be able to do it eventually.
I'm starting to see what you mean. If DDL kept generalized fixup data around during runtime, it would be trivial to swap one function address for another. The only catch is that this technique would only be available to dynamic modules - the pre-linked .exe code can't be modified this way. I'll keep this in mind. -- - EricAnderton at yahoo
Apr 04 2007
parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Pragma wrote:
 for another.  The only catch is that this technique would only be 
 available to dynamic modules - the pre-linked .exe code can't be 
 modified this way.
Wouldn't ld's "--emit-relocs" (aka "-q") remove that limitation? From the ld man page: ===== -q --emit-relocs Leave relocation sections and contents in fully linked exececuta‐ bles. Post link analysis and optimization tools may need this information in order to perform correct modifications of executa‐ bles. This results in larger executables. This option is currently only supported on ELF platforms. ===== I have no idea if optlink has a similar option. Is it even possible to store such information in PE files? (i.e. do they support arbitrary sections or a special section for this stuff?) I seem to remember you can append arbitrary data to PE files without breaking them, but obviously that's not ideal...
Apr 04 2007
prev sibling parent reply Dave <Dave_member pathlink.com> writes:
Don Clugston wrote:
 Daniel Keep wrote:
 Walter Bright wrote:
 Don Clugston wrote:
 Yes, it's possible to detect the CPU type at runtime, but the
 performance penalty is appalling for very short functions.
If it is, then one should put the switch at an enclosing level.
Out of interest, which is faster: a branch at the start of a function (say, just a comparison with a bool), or using function pointers that are set up to point to the correct implementation at start-up? -- Daniel
I suspect the bool comparison would be *much* quicker, since the branch is trivially predictable, and will only cost a single clock cycle. AFAIK, it's only in the past two years that any CPUs have had branch prediction for indirect branches. OTOH, the version involving branches would probably be less code-cache efficient. The fastest option would be to patch the CALL instructions directly, just as a linker does. DDL will probably be able to do it eventually.
I've seen this type of thing used in Intel compiler generated code to good effect, and w/o really any adverse performance that I'm aware of. Just a thought for D -- add a global CPU type to the D standard runtime that is set during initialization (i.e.: in Dmain) that can be used by user code like BLADE and by the compiler itself for (future) processor determined optimization branches. Actually, I'm kind-of surprised one isn't there already (or is it?). - Dave
Apr 04 2007
parent reply Dan <murpsoft hotmail.com> writes:
Dave Wrote:
 Just a thought for D -- add a global CPU type to the D standard runtime that
is set during 
 initialization (i.e.: in Dmain) that can be used by user code like BLADE and
by the compiler itself 
 for (future) processor determined optimization branches.
 
 Actually, I'm kind-of surprised one isn't there already (or is it?).
There is already a cpu variable/object/module or something within the phobos library which provides cpu related information. The problem being discussed is that this information is stored and branches are made based on the cpu type during *runtime*. If instead, it was pre-determined during compile time, these branches could be optimized out. Since version() is designed to perform precisely this function, it makes sense to continue to use version() to identify and target cpu's during compile time.
Apr 04 2007
parent reply Don Clugston <dac nospam.com.au> writes:
Dan wrote:
 Dave Wrote:
 Just a thought for D -- add a global CPU type to the D standard runtime that
is set during 
 initialization (i.e.: in Dmain) that can be used by user code like BLADE and
by the compiler itself 
 for (future) processor determined optimization branches.

 Actually, I'm kind-of surprised one isn't there already (or is it?).
There is already a cpu variable/object/module or something within the phobos library which provides cpu related information. The problem being discussed is that this information is stored and branches are made based on the cpu type during *runtime*. If instead, it was pre-determined during compile time, these branches could be optimized out. Since version() is designed to perform precisely this function, it makes sense to continue to use version() to identify and target cpu's during compile time.
Yes. All that's required is for the spec to include standard names for the CPU types. I don't think any DMD compiler changes are required. How about these: X86_MMX // necessary? (MMX is dead technology). X86_SSE X86_SSE2 X86_SSE3 X86_SSSE3 // is this really necessary? X86_SSE4 Only change would be that the GDC compiler for X86_64 should set all of the above. Right now, the predefined version identifiers cannot be set from the command line, so eventually the compiler would need a CPU switch to control them - it would specify that the compiler has freedom to use recent opcodes, but of course it could continue to generate exactly the same code. I think such a switch would be necessary for supporting the 3- and 4-element array types with swizzle functions discussed recently. However, if we all agreed to use the same set of version identifiers, we can get going immediately.
Apr 05 2007
parent janderson <askme me.com> writes:
Don Clugston wrote:
 Dan wrote:
 Dave Wrote:
 Just a thought for D -- add a global CPU type to the D standard 
 runtime that is set during initialization (i.e.: in Dmain) that can 
 be used by user code like BLADE and by the compiler itself for 
 (future) processor determined optimization branches.

 Actually, I'm kind-of surprised one isn't there already (or is it?).
There is already a cpu variable/object/module or something within the phobos library which provides cpu related information. The problem being discussed is that this information is stored and branches are made based on the cpu type during *runtime*. If instead, it was pre-determined during compile time, these branches could be optimized out. Since version() is designed to perform precisely this function, it makes sense to continue to use version() to identify and target cpu's during compile time.
Yes. All that's required is for the spec to include standard names for the CPU types. I don't think any DMD compiler changes are required. How about these: X86_MMX // necessary? (MMX is dead technology). X86_SSE X86_SSE2 X86_SSE3 X86_SSSE3 // is this really necessary? X86_SSE4 Only change would be that the GDC compiler for X86_64 should set all of the above. Right now, the predefined version identifiers cannot be set from the command line, so eventually the compiler would need a CPU switch to control them - it would specify that the compiler has freedom to use recent opcodes, but of course it could continue to generate exactly the same code. I think such a switch would be necessary for supporting the 3- and 4-element array types with swizzle functions discussed recently. However, if we all agreed to use the same set of version identifiers, we can get going immediately.
I've done this manually with a set of DLL's before. Basically the exe would be a tiny file that would call the correct DLL at startup. You would only compile all the program DLLs when distributing the program. This way you only branch when you pick the correct CPU at application start. Now I should mention Michael Abrash (of which I'm a big Fan) used a technique to simulate DirectX7 effeciently in software where he created CPU code on the fly for different CPU's. It didn't suffer from cache misses (which is typical in self modification programs) because the generated data was consumed a frame later. If the compiler was able to make simple changes to the code at startup like a virtual machine or Interpreter that would be cool. You could in essence consider it as a compressed version of the DLL stradigie I employed, except you'd be able to make use of combinations more effectively and optimize for AMD, Intel chips ect... -Joel
Apr 05 2007
prev sibling parent janderson <askme me.com> writes:
Dan wrote:
 Following the discussion on D's hashtables, and my current understanding that
D doesn't implement any of the x86 CPU features beyond Pentium Pro (same as C
and most other things for that matter) I took another quick peek at D's
predefined Versions.
 
 Currently, when I want to use asm {} I have to use:
 
 version(D_InlineAsm_X86)
 
 Unfortunately, this doesn't cover anything to do with most of the later
instruction extensions D claims to support in the spec, such as the SSE, SSE2,
SSE3, SSSE3, MMX, 3DNow! instruction sets, let alone x86_64.  Ultimately, one
would want to have a cascading support mechanism, such that if I want to use
SSSE3, I'm obviating that the processor must support everything before it:
 
 version(D_InlineAsm_x86)
 version(D_InlineAsm_x86_3DNow)
 version(D_InlineAsm_x86_MMX)
 version(D_InlineAsm_x86_SSE)
 version(D_InlineAsm_x86_SSE2)
 version(D_InlineAsm_x86_SSE3)
 version(D_InlineAsm_x86_SSSE3)
 version(D_InlineAsm_x86_SSE4)
 version(D_InlineAsm_x86_64)
 
 Predefining those so that compiler writers can raise errors if they don't yet
support SSE3 instructions, but they do support x86 and MMX, would be a smart
thing to do.  It would also allow implementors to write code that WON'T compile
and try to run if the user's CPU doesn't support it - preventing what could be
a devastating crash.
 
 Sincerely,
 Dan
What would be nice is if you could tell the compiler to spit out DLLs for each version (for your release build) and also a tiny exe that would run the right one. -Joel
Apr 02 2007