digitalmars.D - Predefined Version expansion

Dan (16/16) Apr 02 2007 Following the discussion on D's hashtables, and my current understanding...

Walter Bright (2/21) Apr 02 2007 The inline assembler does support all the SSE3 instructions, etc.

Dan (4/7) Apr 02 2007 Yes, the GDC and DMD implementations support all the SSE3 instructions, ...

torhu (5/11) Apr 02 2007 Wouldn't you need runtime checking for cpu capabilities most of the time...
Jarrett Billingsley (5/14) Apr 02 2007 Just because their machine doesn't support it, why wouldn't you allow
Walter Bright (2/11) Apr 02 2007 I think what you need is a runtime check, which is provided in std.cpuid...

Dan (5/6) Apr 03 2007 So what you're saying is, we can't optimize the compiler for a specific ...

Derek Parnell (20/26) Apr 03 2007 No, maybe you missed the point. It certainly is possible for one to crea...

Dan (4/31) Apr 03 2007 Yeah, I guess defining the versions yourself is totally possible, and re...

Don Clugston (18/22) Apr 03 2007 Dan,

Walter Bright (2/4) Apr 03 2007 If it is, then one should put the switch at an enclosing level.

Daniel Keep (14/19) Apr 03 2007 Out of interest, which is faster: a branch at the start of a function

Don Clugston (8/20) Apr 03 2007 I suspect the bool comparison would be *much* quicker, since the branch

Pragma (7/28) Apr 04 2007 I'm starting to see what you mean. If DDL kept generalized fixup data a...

Frits van Bommel (17/20) Apr 04 2007 Wouldn't ld's "--emit-relocs" (aka "-q") remove that limitation? From

Dave (8/29) Apr 04 2007 I've seen this type of thing used in Intel compiler generated code to go...

Dan (3/8) Apr 04 2007 There is already a cpu variable/object/module or something within the ph...

Don Clugston (20/30) Apr 05 2007 Yes. All that's required is for the spec to include standard names for

janderson (16/58) Apr 05 2007 I've done this manually with a set of DLL's before. Basically the exe

janderson (5/27) Apr 02 2007 What would be nice is if you could tell the compiler to spit out DLLs

Dan <murpsoft hotmail.com> writes:

Following the discussion on D's hashtables, and my current understanding that D
doesn't implement any of the x86 CPU features beyond Pentium Pro (same as C and
most other things for that matter) I took another quick peek at D's predefined
Versions.

Currently, when I want to use asm {} I have to use:

version(D_InlineAsm_X86)

Unfortunately, this doesn't cover anything to do with most of the later
instruction extensions D claims to support in the spec, such as the SSE, SSE2,
SSE3, SSSE3, MMX, 3DNow! instruction sets, let alone x86_64.  Ultimately, one
would want to have a cascading support mechanism, such that if I want to use
SSSE3, I'm obviating that the processor must support everything before it:

version(D_InlineAsm_x86)
version(D_InlineAsm_x86_3DNow)
version(D_InlineAsm_x86_MMX)
version(D_InlineAsm_x86_SSE)
version(D_InlineAsm_x86_SSE2)
version(D_InlineAsm_x86_SSE3)
version(D_InlineAsm_x86_SSSE3)
version(D_InlineAsm_x86_SSE4)
version(D_InlineAsm_x86_64)

Predefining those so that compiler writers can raise errors if they don't yet
support SSE3 instructions, but they do support x86 and MMX, would be a smart
thing to do.  It would also allow implementors to write code that WON'T compile
and try to run if the user's CPU doesn't support it - preventing what could be
a devastating crash.

Sincerely,
Dan

Apr 02 2007

Walter Bright <newshound1 digitalmars.com> writes:

Dan wrote:
 Following the discussion on D's hashtables, and my current understanding that
D doesn't implement any of the x86 CPU features beyond Pentium Pro (same as C
and most other things for that matter) I took another quick peek at D's
predefined Versions.
 
 Currently, when I want to use asm {} I have to use:
 
 version(D_InlineAsm_X86)
 
 Unfortunately, this doesn't cover anything to do with most of the later
instruction extensions D claims to support in the spec, such as the SSE, SSE2,
SSE3, SSSE3, MMX, 3DNow! instruction sets, let alone x86_64.  Ultimately, one
would want to have a cascading support mechanism, such that if I want to use
SSSE3, I'm obviating that the processor must support everything before it:
 
 version(D_InlineAsm_x86)
 version(D_InlineAsm_x86_3DNow)
 version(D_InlineAsm_x86_MMX)
 version(D_InlineAsm_x86_SSE)
 version(D_InlineAsm_x86_SSE2)
 version(D_InlineAsm_x86_SSE3)
 version(D_InlineAsm_x86_SSSE3)
 version(D_InlineAsm_x86_SSE4)
 version(D_InlineAsm_x86_64)
 
 Predefining those so that compiler writers can raise errors if they don't yet
support SSE3 instructions, but they do support x86 and MMX, would be a smart
thing to do.  It would also allow implementors to write code that WON'T compile
and try to run if the user's CPU doesn't support it - preventing what could be
a devastating crash.

The inline assembler does support all the SSE3 instructions, etc.

Apr 02 2007

Dan <murpsoft hotmail.com> writes:

Walter Bright Wrote:
 Predefining those so that compiler writers can raise errors if they don't yet
support SSE3 instructions, but they do support x86 and MMX, would be a smart
thing to do.  It would also allow implementors to write code that WON'T compile
and try to run if the user's CPU doesn't support it - preventing what could be
a devastating crash.

 
 The inline assembler does support all the SSE3 instructions, etc.

Yes, the GDC and DMD implementations support all the SSE3 instructions, but
someone with a i486 Pentium Pro does not, and neither might another D
implementation implementing the D language spec.

When I use an SSE3 instruction in D, is it going to raise an error if the
person's system is only a Pentium Pro?

If that could be done by checking the CPU support for each instruction
implicitly, then it would be even better than versioning it all off...  but I
don't think that's the case - and it's not specified?

Apr 02 2007

torhu <fake address.dude> writes:

Dan wrote:
<snip>
 Yes, the GDC and DMD implementations support all the SSE3 instructions, but
someone with a i486 Pentium Pro does not, and neither might another D
implementation implementing the D language spec.
 
 When I use an SSE3 instruction in D, is it going to raise an error if the
person's system is only a Pentium Pro?
 
 If that could be done by checking the CPU support for each instruction
implicitly, then it would be even better than versioning it all off...  but I
don't think that's the case - and it's not specified?
 

Wouldn't you need runtime checking for cpu capabilities most of the time 
anyway?  The functions in std.cpuid should get you a long way in most 
cases.  Only works for x86, though.

Apr 02 2007

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"Dan" <murpsoft hotmail.com> wrote in message 
news:eurruh$26gc$1 digitalmars.com...
 Walter Bright Wrote:

 Yes, the GDC and DMD implementations support all the SSE3 instructions, 
 but someone with a i486 Pentium Pro does not, and neither might another D 
 implementation implementing the D language spec.

 When I use an SSE3 instruction in D, is it going to raise an error if the 
 person's system is only a Pentium Pro?

 If that could be done by checking the CPU support for each instruction 
 implicitly, then it would be even better than versioning it all off... 
 but I don't think that's the case - and it's not specified?

Just because their machine doesn't support it, why wouldn't you allow 
someone to use SSE3 instructions?  For that matter, who says you have to be 
running the compiler on an x86 platform to compile to x86 machine code?

Apr 02 2007

Walter Bright <newshound1 digitalmars.com> writes:

Dan wrote:
 Walter Bright Wrote:
 Predefining those so that compiler writers can raise errors if they don't yet
support SSE3 instructions, but they do support x86 and MMX, would be a smart
thing to do.  It would also allow implementors to write code that WON'T compile
and try to run if the user's CPU doesn't support it - preventing what could be
a devastating crash.

 The inline assembler does support all the SSE3 instructions, etc.

 
 Yes, the GDC and DMD implementations support all the SSE3 instructions, but
someone with a i486 Pentium Pro does not, and neither might another D
implementation implementing the D language spec.
 
 When I use an SSE3 instruction in D, is it going to raise an error if the
person's system is only a Pentium Pro?
 
 If that could be done by checking the CPU support for each instruction
implicitly, then it would be even better than versioning it all off...  but I
don't think that's the case - and it's not specified?

I think what you need is a runtime check, which is provided in std.cpuid.

Apr 02 2007

Dan <murpsoft hotmail.com> writes:

Walter Bright Wrote:
 I think what you need is a runtime check, which is provided in std.cpuid.

So what you're saying is, we can't optimize the compiler for a specific
variation of the x86, and are therefore stuck with writing generic programs
that branch off for each cpu kind during runtime?

 Jarrett:  It would be completely pointless to virtualize SSE3, or any other
x86 upgrade.  They provide instructions to assist with optimization. 
Virtualizing it is a huge de-optimization.  

If you somehow version'd off some asm, you could write asm for x86_64, one for
x86 with SSE2, and one for the rest, and you'd have reasonable version control
with fallbacks.  If the compiler was targetting an x86 without any extensions,
it would automatically choose the right code.

Is it bad to allow that?

Apr 03 2007

Derek Parnell <derek psych.ward> writes:

On Tue, 03 Apr 2007 10:48:01 -0400, Dan wrote:

 Walter Bright Wrote:
 I think what you need is a runtime check, which is provided in std.cpuid.

 
 So what you're saying is, we can't optimize the compiler for a specific
 variation of the x86, and are therefore stuck with writing generic 
 programs that branch off for each cpu kind during runtime?


No, maybe you missed the point. It certainly is possible for one to create
editions of an application for specific hardware configurations; and using
the version() statement is a reasonable way to do that.

  version(SSE2) { . . . }
  version(SSE)  { . . . }
etc...

  dmd -version=SSE2 myapp.d
  dmd -version=SSE myapp.d

However, such editions should be able to be generated regardless of which
hardware architecure the compiler just happens to be running on at the
time. In other words, setting the version values within the compiler based
on the hardware at compilation time is not very useful. It would be better
to set these version values at the compiler command line level, if one does
really want hardware-specific editions of the app.


-- 
Derek Parnell
Melbourne, Australia
"Justice for David Hicks!"
skype: derek.j.parnell

Apr 03 2007

Dan <murpsoft hotmail.com> writes:

Derek Parnell Wrote:

 On Tue, 03 Apr 2007 10:48:01 -0400, Dan wrote:
 
 Walter Bright Wrote:
 I think what you need is a runtime check, which is provided in std.cpuid.

 
 So what you're saying is, we can't optimize the compiler for a specific
 variation of the x86, and are therefore stuck with writing generic 
 programs that branch off for each cpu kind during runtime?

 
 
 No, maybe you missed the point. It certainly is possible for one to create
 editions of an application for specific hardware configurations; and using
 the version() statement is a reasonable way to do that.
 
   version(SSE2) { . . . }
   version(SSE)  { . . . }
 etc...
 
   dmd -version=SSE2 myapp.d
   dmd -version=SSE myapp.d
 
 However, such editions should be able to be generated regardless of which
 hardware architecure the compiler just happens to be running on at the
 time. In other words, setting the version values within the compiler based
 on the hardware at compilation time is not very useful. It would be better
 to set these version values at the compiler command line level, if one does
 really want hardware-specific editions of the app.

Yeah, I guess defining the versions yourself is totally possible, and
reasonable.  : p

Less persuasively, and more as a closing note, I tend to think that the more D
is used for other platforms (GDC retargetable?) the more those predefined
platform ones will be wanted.  

It's generally bad to have each programmer use different names for exactly the
same thing.

Apr 03 2007

Don Clugston <dac nospam.com.au> writes:

Dan wrote:
 Walter Bright Wrote:
 I think what you need is a runtime check, which is provided in std.cpuid.

 
 So what you're saying is, we can't optimize the compiler for a specific
variation of the x86, and are therefore stuck with writing generic programs
that branch off for each cpu kind during runtime?

Dan,
I think that there are two seperate issues. The D_InlineAsm_X86 seems to 
mean "this compiler supports inline asm" rather than "this compiler is 
targetting X86". I think it's reasonable to require that any compiler 
which supports X86 inline asm should support *all* X86 opcodes.

However...
I think we do need to work out a way of specifying the CPU target -- but 
that should be accessible even if inline asm is not supported. (For 
example, you may want to call a library function which uses SSE3, even 
if you're not using it yourself).
Yes, it's possible to detect the CPU type at runtime, but the 
performance penalty is appalling for very short functions. I think the 
ideal would be a install-time linker -- link in the appropriate code 
during installation! The CPU type will always be the same, every time 
it's run. But version names would be a great first step.

Standard names for the CPU targets needs to happen. Otherwise everyone 
will make up their own.

Apr 03 2007

Walter Bright <newshound1 digitalmars.com> writes:

Don Clugston wrote:
 Yes, it's possible to detect the CPU type at runtime, but the 
 performance penalty is appalling for very short functions.

If it is, then one should put the switch at an enclosing level.

Apr 03 2007

Daniel Keep <daniel.keep.lists gmail.com> writes:

Walter Bright wrote:
 Don Clugston wrote:
 Yes, it's possible to detect the CPU type at runtime, but the
 performance penalty is appalling for very short functions.

 
 If it is, then one should put the switch at an enclosing level.

Out of interest, which is faster: a branch at the start of a function
(say, just a comparison with a bool), or using function pointers that
are set up to point to the correct implementation at start-up?

	-- Daniel

-- 
int getRandomNumber()
{
    return 4; // chosen by fair dice roll.
              // guaranteed to be random.
}

http://xkcd.com/

v2sw5+8Yhw5ln4+5pr6OFPma8u6+7Lw4Tm6+7l6+7D
i28a2Xs3MSr2e4/6+7t4TNSMb6HTOp5en5g6RAHCP  http://hackerkey.com/

Apr 03 2007

Don Clugston <dac nospam.com.au> writes:

Daniel Keep wrote:
 
 Walter Bright wrote:
 Don Clugston wrote:
 Yes, it's possible to detect the CPU type at runtime, but the
 performance penalty is appalling for very short functions.

 If it is, then one should put the switch at an enclosing level.

 
 Out of interest, which is faster: a branch at the start of a function
 (say, just a comparison with a bool), or using function pointers that
 are set up to point to the correct implementation at start-up?
 
 	-- Daniel

I suspect the bool comparison would be *much* quicker, since the branch 
is trivially predictable, and will only cost a single clock cycle. 
AFAIK, it's only in the past two years that any CPUs have had branch 
prediction for indirect branches. OTOH, the version involving branches 
would probably be less code-cache efficient.
The fastest option would be to patch the CALL instructions directly, 
just as a linker does. DDL will probably be able to do it eventually.

Apr 03 2007

Pragma <ericanderton yahoo.removeme.com> writes:

Don Clugston wrote:
 Daniel Keep wrote:
 Walter Bright wrote:
 Don Clugston wrote:
 Yes, it's possible to detect the CPU type at runtime, but the
 performance penalty is appalling for very short functions.

 If it is, then one should put the switch at an enclosing level.

 Out of interest, which is faster: a branch at the start of a function
 (say, just a comparison with a bool), or using function pointers that
 are set up to point to the correct implementation at start-up?

     -- Daniel

 
 I suspect the bool comparison would be *much* quicker, since the branch 
 is trivially predictable, and will only cost a single clock cycle. 
 AFAIK, it's only in the past two years that any CPUs have had branch 
 prediction for indirect branches. OTOH, the version involving branches 
 would probably be less code-cache efficient.
 The fastest option would be to patch the CALL instructions directly, 
 just as a linker does. DDL will probably be able to do it eventually.

I'm starting to see what you mean.  If DDL kept generalized fixup data around
during runtime, it would be trivial to 
swap one function address for another.  The only catch is that this technique
would only be available to dynamic modules 
- the pre-linked .exe code can't be modified this way.

I'll keep this in mind.

-- 
- EricAnderton at yahoo

Apr 04 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Pragma wrote:
 for another.  The only catch is that this technique would only be 
 available to dynamic modules - the pre-linked .exe code can't be 
 modified this way.

Wouldn't ld's "--emit-relocs" (aka "-q") remove that limitation? From 
the ld man page:
=====
-q
--emit-relocs
     Leave relocation sections and contents in fully  linked  exececuta‐
     bles.   Post  link  analysis  and  optimization tools may need this
     information in order to perform correct modifications  of  executa‐
     bles.  This results in larger executables.

     This option is currently only supported on ELF platforms.
=====

I have no idea if optlink has a similar option.
Is it even possible to store such information in PE files? (i.e. do they 
support arbitrary sections or a special section for this stuff?) I seem 
to remember you can append arbitrary data to PE files without breaking 
them, but obviously that's not ideal...

Apr 04 2007

Dave <Dave_member pathlink.com> writes:

Don Clugston wrote:
 Daniel Keep wrote:
 Walter Bright wrote:
 Don Clugston wrote:
 Yes, it's possible to detect the CPU type at runtime, but the
 performance penalty is appalling for very short functions.

 If it is, then one should put the switch at an enclosing level.

 Out of interest, which is faster: a branch at the start of a function
 (say, just a comparison with a bool), or using function pointers that
 are set up to point to the correct implementation at start-up?

     -- Daniel

 
 I suspect the bool comparison would be *much* quicker, since the branch 
 is trivially predictable, and will only cost a single clock cycle. 
 AFAIK, it's only in the past two years that any CPUs have had branch 
 prediction for indirect branches. OTOH, the version involving branches 
 would probably be less code-cache efficient.
 The fastest option would be to patch the CALL instructions directly, 
 just as a linker does. DDL will probably be able to do it eventually.

I've seen this type of thing used in Intel compiler generated code to good
effect, and w/o really 
any adverse performance that I'm aware of.

Just a thought for D -- add a global CPU type to the D standard runtime that is
set during 
initialization (i.e.: in Dmain) that can be used by user code like BLADE and by
the compiler itself 
for (future) processor determined optimization branches.

Actually, I'm kind-of surprised one isn't there already (or is it?).

- Dave

Apr 04 2007

Dan <murpsoft hotmail.com> writes:

Dave Wrote:
 Just a thought for D -- add a global CPU type to the D standard runtime that
is set during 
 initialization (i.e.: in Dmain) that can be used by user code like BLADE and
by the compiler itself 
 for (future) processor determined optimization branches.
 
 Actually, I'm kind-of surprised one isn't there already (or is it?).

There is already a cpu variable/object/module or something within the phobos
library which provides cpu related information.  The problem being discussed is
that this information is stored and branches are made based on the cpu type
during *runtime*.

If instead, it was pre-determined during compile time, these branches could be
optimized out.  Since version() is designed to perform precisely this function,
it makes sense to continue to use version() to identify and target cpu's during
compile time.

Apr 04 2007

Don Clugston <dac nospam.com.au> writes:

Dan wrote:
 Dave Wrote:
 Just a thought for D -- add a global CPU type to the D standard runtime that
is set during 
 initialization (i.e.: in Dmain) that can be used by user code like BLADE and
by the compiler itself 
 for (future) processor determined optimization branches.

 Actually, I'm kind-of surprised one isn't there already (or is it?).

 
 There is already a cpu variable/object/module or something within the phobos
library which provides cpu related information.  The problem being discussed is
that this information is stored and branches are made based on the cpu type
during *runtime*.
 
 If instead, it was pre-determined during compile time, these branches could be
optimized out.  Since version() is designed to perform precisely this function,
it makes sense to continue to use version() to identify and target cpu's during
compile time.

Yes. All that's required is for the spec to include standard names for 
the CPU types. I don't think any DMD compiler changes are required.

How about these:

X86_MMX   // necessary? (MMX is dead technology).
X86_SSE
X86_SSE2
X86_SSE3
X86_SSSE3 // is this really necessary?
X86_SSE4

Only change would be that the GDC compiler for X86_64 should set all of 
the above.
Right now, the predefined version identifiers cannot be set from the 
command line, so eventually the compiler would need a CPU switch to 
control them - it would specify that the compiler has freedom to use 
recent opcodes, but of course it could continue to generate exactly the 
same code. I think such a switch would be necessary for supporting the 
3- and 4-element array types with swizzle functions discussed recently.

However, if we all agreed to use the same set of version identifiers, we 
can get going immediately.

Apr 05 2007

janderson <askme me.com> writes:

Don Clugston wrote:
 Dan wrote:
 Dave Wrote:
 Just a thought for D -- add a global CPU type to the D standard 
 runtime that is set during initialization (i.e.: in Dmain) that can 
 be used by user code like BLADE and by the compiler itself for 
 (future) processor determined optimization branches.

 Actually, I'm kind-of surprised one isn't there already (or is it?).

 There is already a cpu variable/object/module or something within the 
 phobos library which provides cpu related information.  The problem 
 being discussed is that this information is stored and branches are 
 made based on the cpu type during *runtime*.

 If instead, it was pre-determined during compile time, these branches 
 could be optimized out.  Since version() is designed to perform 
 precisely this function, it makes sense to continue to use version() 
 to identify and target cpu's during compile time.

 
 Yes. All that's required is for the spec to include standard names for 
 the CPU types. I don't think any DMD compiler changes are required.
 
 How about these:
 
 X86_MMX   // necessary? (MMX is dead technology).
 X86_SSE
 X86_SSE2
 X86_SSE3
 X86_SSSE3 // is this really necessary?
 X86_SSE4
 
 Only change would be that the GDC compiler for X86_64 should set all of 
 the above.
 Right now, the predefined version identifiers cannot be set from the 
 command line, so eventually the compiler would need a CPU switch to 
 control them - it would specify that the compiler has freedom to use 
 recent opcodes, but of course it could continue to generate exactly the 
 same code. I think such a switch would be necessary for supporting the 
 3- and 4-element array types with swizzle functions discussed recently.
 
 However, if we all agreed to use the same set of version identifiers, we 
 can get going immediately.

I've done this manually with a set of DLL's before.  Basically the exe 
would be a tiny file that would call the correct DLL at startup.  You 
would only compile all the program DLLs when distributing the program. 
This way you only branch when you pick the correct CPU at application start.

Now I should mention Michael Abrash (of which I'm a big Fan) used a 
technique to simulate DirectX7 effeciently in software where he created 
CPU code on the fly for different CPU's.  It didn't suffer from cache 
misses (which is typical in self modification programs) because the 
generated data was consumed a frame later.

If the compiler was able to make simple changes to the code at startup 
like a virtual machine or Interpreter that would be cool.  You could in 
essence consider it as a compressed version of the DLL stradigie I 
employed, except you'd be able to make use of combinations more 
effectively and optimize for AMD, Intel chips ect...

-Joel

Apr 05 2007

janderson <askme me.com> writes:

Dan wrote:
 Following the discussion on D's hashtables, and my current understanding that
D doesn't implement any of the x86 CPU features beyond Pentium Pro (same as C
and most other things for that matter) I took another quick peek at D's
predefined Versions.
 
 Currently, when I want to use asm {} I have to use:
 
 version(D_InlineAsm_X86)
 
 Unfortunately, this doesn't cover anything to do with most of the later
instruction extensions D claims to support in the spec, such as the SSE, SSE2,
SSE3, SSSE3, MMX, 3DNow! instruction sets, let alone x86_64.  Ultimately, one
would want to have a cascading support mechanism, such that if I want to use
SSSE3, I'm obviating that the processor must support everything before it:
 
 version(D_InlineAsm_x86)
 version(D_InlineAsm_x86_3DNow)
 version(D_InlineAsm_x86_MMX)
 version(D_InlineAsm_x86_SSE)
 version(D_InlineAsm_x86_SSE2)
 version(D_InlineAsm_x86_SSE3)
 version(D_InlineAsm_x86_SSSE3)
 version(D_InlineAsm_x86_SSE4)
 version(D_InlineAsm_x86_64)
 
 Predefining those so that compiler writers can raise errors if they don't yet
support SSE3 instructions, but they do support x86 and MMX, would be a smart
thing to do.  It would also allow implementors to write code that WON'T compile
and try to run if the user's CPU doesn't support it - preventing what could be
a devastating crash.
 
 Sincerely,
 Dan

What would be nice is if you could tell the compiler to spit out DLLs 
for each version (for your release build) and also a tiny exe that would 
run the right one.

-Joel

Apr 02 2007

D Programming

C/C++ Programming

Other

digitalmars.D - Predefined Version expansion