digitalmars.D - Built-in vector types

Simon Hobbs (24/24) May 15 2004 If you want D to gain a great advantage (at least for games) over modern...

Billy Zelsnack (8/43) May 15 2004 +1
Walter (7/10) May 15 2004 The language already supports vector operations on arrays of floats,

Ben Hinkle (5/16) May 15 2004 Walter, do you know much about Cg? I was just poking around the nvidia s...

Walter (3/8) May 23 2004 I looked at it briefly a couple years back, I think.

=?iso-8859-1?q?Knud_S=F8rensen?= (37/40) May 16 2004 As fare as I can see from http://developer.nvidia.com/attach/6043

J Anderson (15/26) May 16 2004 I would argue that having matrix multiplication and such will bloat the

Andy Friesen (5/12) May 16 2004 If Phobos included some types and functions for these sorts of

J Anderson (4/17) May 16 2004 Exactly!

Ben Hinkle (7/40) May 15 2004 Does float4 have value or reference semantics?

Billy Zelsnack (11/17) May 15 2004 I would like to not care when passing it around and trust the compiler

=?iso-8859-1?q?Knud_S=F8rensen?= (12/12) May 15 2004 Hi
Ben Hinkle (17/38) May 15 2004 Reference or value semantics makes a difference with code like

Simon Hobbs (15/21) May 16 2004 Sorry, I'm not explaining myself properly.

hellcatv hotmail.com (13/36) May 16 2004 ideally having a struct with vector-like math ops should vectorize

Simon Hobbs (5/13) May 16 2004 Well, in the future when X86 and PowerPC 'go away' it would still be tri...

Ben Hinkle (22/50) May 16 2004 OK. My first thought was to use the inline assembler but now that you sa...

hellcatv hotmail.com (17/67) May 16 2004 Actually My research project involves doing things like BLAS on the GPU

hellcatv hotmail.com (10/34) May 15 2004 I have implemented Cg's float2 float3 and float4 classes in D

hellcatv hotmail.com (37/83) May 15 2004 First of all I'd like to say that if walter wishes to integrate my vec.d...

Ben Hinkle (48/55) May 15 2004 very nifty. I feel your pain implementing all those swizzle operators. I...
J Anderson (6/43) May 15 2004 Nice. Parhaps you should check out / take some ideas from burtons

Simon Hobbs <Simon_member pathlink.com> writes:

If you want D to gain a great advantage (at least for games) over modern c++
compilers  I believe it would be a smart move to add a built in type for float4
(and I guess double2 for completeness.)

CPU support for these types is getting to be pretty ubiquitous and making them
built-in has all the advantages of c++ intrinsics and far more besides:

1. consistency across implementations

2. native support for constants

3. debug and release code won't have the huge (order of magnitude) speed
disparity that they do in the c++ method. In c++ it is really a pre-requisite to
make a class wrapper for the intrinsic functions because they are entirely
un-usable in their native form. This works fine but a vector add, for example,
calls a 12 instruction function in a debug build and in a release build is a
single vector instruction. Trying to debug a 60fps game at 10fps is a royal pain
in the arse.

4. the possibility of adding built-in support for vector swizzling/write
masking/element access (a la Cg/HLSL) although I guess this is rather
contencious

I would be inclined make dot3, dot4, cross, etc. into intrinsic functions rather
than trying to invent dodgy operators for them.

Another issue that arises is the ability to keep temporary single scalar results
in a vector register so that they don't keep being transferred to and from FPU
registers. Would the optimizer be able to factor this problem away, or would an
explicit float1 (or whatever) type be better?

Si

May 15 2004

Billy Zelsnack <billy_zelsnack yahoo.com> writes:

+1

D already has very clean access to OpenGL and other C libraries. Having 
a vector type along with some good standard vector/matrix libraries will 
draw a lot of game developers. Game developers are an interesting bunch. 
We typically have terrible time constraints to construct bleeding edge 
technology that is supposed to run real-time on a wide performance range 
of computers.


Simon Hobbs wrote:
 If you want D to gain a great advantage (at least for games) over modern c++
 compilers  I believe it would be a smart move to add a built in type for float4
 (and I guess double2 for completeness.)
 
 CPU support for these types is getting to be pretty ubiquitous and making them
 built-in has all the advantages of c++ intrinsics and far more besides:
 
 1. consistency across implementations
 
 2. native support for constants
 
 3. debug and release code won't have the huge (order of magnitude) speed
 disparity that they do in the c++ method. In c++ it is really a pre-requisite
to
 make a class wrapper for the intrinsic functions because they are entirely
 un-usable in their native form. This works fine but a vector add, for example,
 calls a 12 instruction function in a debug build and in a release build is a
 single vector instruction. Trying to debug a 60fps game at 10fps is a royal
pain
 in the arse.
 
 4. the possibility of adding built-in support for vector swizzling/write
 masking/element access (a la Cg/HLSL) although I guess this is rather
 contencious
 
 I would be inclined make dot3, dot4, cross, etc. into intrinsic functions
rather
 than trying to invent dodgy operators for them.
 
 Another issue that arises is the ability to keep temporary single scalar
results
 in a vector register so that they don't keep being transferred to and from FPU
 registers. Would the optimizer be able to factor this problem away, or would an
 explicit float1 (or whatever) type be better?
 
 Si

May 15 2004

"Walter" <newshound digitalmars.com> writes:

The language already supports vector operations on arrays of floats,
doubles, or anything else. Currently, however, it is not implemented in the
compiler.

"Simon Hobbs" <Simon_member pathlink.com> wrote in message
news:c851kq$v01$1 digitaldaemon.com...
 If you want D to gain a great advantage (at least for games) over modern

c++
 compilers  I believe it would be a smart move to add a built in type for

float4
 (and I guess double2 for completeness.)

May 15 2004

Ben Hinkle <bhinkle4 juno.com> writes:

Walter wrote:

 The language already supports vector operations on arrays of floats,
 doubles, or anything else. Currently, however, it is not implemented in
 the compiler.
 
 "Simon Hobbs" <Simon_member pathlink.com> wrote in message
 news:c851kq$v01$1 digitaldaemon.com...
 If you want D to gain a great advantage (at least for games) over modern

 c++
 compilers  I believe it would be a smart move to add a built in type for

 float4
 (and I guess double2 for completeness.)


Walter, do you know much about Cg? I was just poking around the nvidia site
reading the spec and it looks like they have some interesting ideas to
avoid aliasing (inout is copy-in-copy-out, no pointers, etc) that could be
nifty to pull into D.

May 15 2004

"Walter" <newshound digitalmars.com> writes:

"Ben Hinkle" <bhinkle4 juno.com> wrote in message
news:c868ng$2muf$1 digitaldaemon.com...
 Walter, do you know much about Cg?

I looked at it briefly a couple years back, I think.

 I was just poking around the nvidia site
 reading the spec and it looks like they have some interesting ideas to
 avoid aliasing (inout is copy-in-copy-out, no pointers, etc) that could be
 nifty to pull into D.

May 23 2004

=?iso-8859-1?q?Knud_S=F8rensen?= <knud NetRunner.all-technology.com> writes:

On Sat, 15 May 2004 11:36:40 -0700, Walter wrote:

 The language already supports vector operations on arrays of floats,
 doubles, or anything else. Currently, however, it is not implemented in the
 compiler.

As fare as I can see from  http://developer.nvidia.com/attach/6043
what Cg have and D is missing.

Is matrix multiplication, vector swizzling and write masking

I suggest the following syntax for D using the example from the Cg link.

float[4] vec1={4.0,-2.0,5.0,3.0};

float[2] vec2 =vec1[1,0];  // vec2 ={-2.0,4.0}
float scalar =vec1[3];     // scaler = 3.0
float[3] vec3=scalar;	   // vec3 = {3.0,3.0,3.0}	

write masking

vec1[0,3]=vec3;     // vec1 = {3.0,-2.0,5.0,3.0}

Something that have been bordering me about D is
that a slice 0..4 means 0,1,2,3 and not 0,1,2,3,4
if you chose to use the comma notation for masking 
I think that it would be better to have 0..4 as a short 
for 0,1,2,3,4 instead of 0,1,2,3.


With matrix multiplication i think that is 
better to use the more general Einstein summation.
Which would allow a very short notation for vector
calculations.

in this notation an affine (a*v+b) transformation on a vector 
would be written like this

double[4] vec1, vec2, b;
double[4][4] a;

  vec2[i=0..3]=a[i][j=0..3]*vec1[j] + b[i];


but on an array with 100 vectors you could transform with

double[4][100] arr1,arr2;
 
  arr2[i=0..3][k=0..99]=a[i][j=0..3]*arr1[j][k] + b[i];

the advantage of having this implemented in the core 
language is to exploit the processors vector unit (MMX)
,a graphic processors (GPU) or a math unit like 
http://www.clearspeed.com/ without rewriting the program in assembler for
the specific hardware.

Maybe it would be a good idea to have compiler modules for different types
of hardware.

Knud

May 16 2004

J Anderson <REMOVEanderson badmama.com.au> writes:

Knud S�rensen wrote:

the advantage of having this implemented in the core 
language is to exploit the processors vector unit (MMX)
,a graphic processors (GPU) or a math unit like 
http://www.clearspeed.com/ without rewriting the program in assembler for
the specific hardware.
  

I would argue that having matrix multiplication and such will bloat the 
language.  It should be a library feature.  I see no problem with 
writing it in assembler (as long as I don't have to write it <g>).  It 
would be better to have this as part of the standard library.  That 
language won't be able to provide much additional speed by hard-wiring 
things like MMX into the language.  Remember MMX and the like are 
designed to work well as language extensions in the first place.

Why not include, in the language, every useful hardware data-structure 
under-the-sun?  Data structures should only be put into the language 
when they make sense and can be done much cleaner then with libraries.

Maybe it would be a good idea to have compiler modules for different types
of hardware.

  

The language shouldn't be tied to the hardware.  Its the job of library 
vendors to make porting hell not the language.

Knud
  


-- 
-Anderson: http://badmama.com.au/~anderson/

May 16 2004

Andy Friesen <andy ikagames.com> writes:

J Anderson wrote:
 I would argue that having matrix multiplication and such will bloat the 
 language.  It should be a library feature.  I see no problem with 
 writing it in assembler (as long as I don't have to write it <g>).  It 
 would be better to have this as part of the standard library.  That 
 language won't be able to provide much additional speed by hard-wiring 
 things like MMX into the language.  Remember MMX and the like are 
 designed to work well as language extensions in the first place.

If Phobos included some types and functions for these sorts of 
operations, compiler vendors would hypothetically be able to implement 
those operations as intrinsics.

  -- andy

May 16 2004

J Anderson <REMOVEanderson badmama.com.au> writes:

Andy Friesen wrote:

 J Anderson wrote:

 I would argue that having matrix multiplication and such will bloat 
 the language.  It should be a library feature.  I see no problem with 
 writing it in assembler (as long as I don't have to write it <g>).  
 It would be better to have this as part of the standard library.  
 That language won't be able to provide much additional speed by 
 hard-wiring things like MMX into the language.  Remember MMX and the 
 like are designed to work well as language extensions in the first 
 place.


 If Phobos included some types and functions for these sorts of 
 operations, compiler vendors would hypothetically be able to implement 
 those operations as intrinsics.

  -- andy

Exactly!

-- 
-Anderson: http://badmama.com.au/~anderson/

May 16 2004

Ben Hinkle <bhinkle4 juno.com> writes:

Simon Hobbs wrote:

 If you want D to gain a great advantage (at least for games) over modern
 c++
 compilers  I believe it would be a smart move to add a built in type for
 float4 (and I guess double2 for completeness.)

Does float4 have value or reference semantics?
I don't think I'd use a float4 myself since I'm not a game programmer but it
has repeatedly come up about using (shortish) arrays with value semantics.
Some generic "static array with value-semantics" would be cool. Right now
to get something like it I'm using a struct with the type and length as
template parameters. It works fine but is verbose.

 CPU support for these types is getting to be pretty ubiquitous and making
 them built-in has all the advantages of c++ intrinsics and far more
 besides:
 
 1. consistency across implementations
 
 2. native support for constants
 
 3. debug and release code won't have the huge (order of magnitude) speed
 disparity that they do in the c++ method. In c++ it is really a
 pre-requisite to make a class wrapper for the intrinsic functions because
 they are entirely un-usable in their native form. This works fine but a
 vector add, for example, calls a 12 instruction function in a debug build
 and in a release build is a single vector instruction. Trying to debug a
 60fps game at 10fps is a royal pain in the arse.
 
 4. the possibility of adding built-in support for vector swizzling/write
 masking/element access (a la Cg/HLSL) although I guess this is rather
 contencious
 
 I would be inclined make dot3, dot4, cross, etc. into intrinsic functions
 rather than trying to invent dodgy operators for them.
 
 Another issue that arises is the ability to keep temporary single scalar
 results in a vector register so that they don't keep being transferred to
 and from FPU registers. Would the optimizer be able to factor this problem
 away, or would an explicit float1 (or whatever) type be better?
 
 Si

May 15 2004

Billy Zelsnack <billy_zelsnack yahoo.com> writes:

 Does float4 have value or reference semantics?
 I don't think I'd use a float4 myself since I'm not a game programmer but it
 has repeatedly come up about using (shortish) arrays with value semantics.
 Some generic "static array with value-semantics" would be cool. Right now
 to get something like it I'm using a struct with the type and length as
 template parameters. It works fine but is verbose.

I would like to not care when passing it around and trust the compiler 
to make the fastest decision for me. I have tons of c++ code that looks 
something like this:

void doSomething(const float3& vecA)

I pass by const reference because I am assuming it will be faster, but 
in some  cases it just might not be. Who knows and I don't really care 
how it is passed as long as it is the fastest way possible.

As for (shortish), I regulary use float2,float3,float4, and float16. 
float16 is for a 4x4 matrix, but that could just be 4 float4 and be just 
as efficient. So I think 2,3,4 lengths would give you 99% of the value 
for vector types (as far as game development is concerned).

May 15 2004

=?iso-8859-1?q?Knud_S=F8rensen?= <knud NetRunner.all-technology.com> writes:

Hi 

Did you read my post on Einstein notation ??

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/288

Would that be useful for game developers ??

I have been thinking that you could drop the index[] notation 
in my first post and just define the slice at first contact.

Like

v2[i=0..3]=m[i][k=0..5]*v1[k];

for 4x6 matrix multiplication 

or 

det=M[0][i=0..3]*M[1]*[j=0..3]*M[2][k=0..3]*M[3][l=0..3]*P(i,j,k,l);

to compute the determinant for 4x4 matrix.

May 15 2004

Ben Hinkle <bhinkle4 juno.com> writes:

Billy Zelsnack wrote:

 Does float4 have value or reference semantics?
 I don't think I'd use a float4 myself since I'm not a game programmer but
 it has repeatedly come up about using (shortish) arrays with value
 semantics. Some generic "static array with value-semantics" would be
 cool. Right now to get something like it I'm using a struct with the type
 and length as template parameters. It works fine but is verbose.

 
 I would like to not care when passing it around and trust the compiler
 to make the fastest decision for me. I have tons of c++ code that looks
 something like this:
 
 void doSomething(const float3& vecA)
 
 I pass by const reference because I am assuming it will be faster, but
 in some  cases it just might not be. Who knows and I don't really care
 how it is passed as long as it is the fastest way possible.
 
 As for (shortish), I regulary use float2,float3,float4, and float16.
 float16 is for a 4x4 matrix, but that could just be 4 float4 and be just
 as efficient. So I think 2,3,4 lengths would give you 99% of the value
 for vector types (as far as game development is concerned).

Reference or value semantics makes a difference with code like
 float4 x,y;
 ... 
 x[0] = 2.0; // or whatever the syntax is for an element of x
 y = x;
 y[0] = 1.0;

What is x[0]? reference semantics says 1.0, value semantics says 2.0.
Similarly with reference semantics you need to be careful with
  float4 doSomething() {
    float4 res;
    ...
    return res;
  }
unless the memory for "res" is either passed in as an input or allocated
from the heap or something like that. 

-Ben

May 15 2004

Simon Hobbs <Simon_member pathlink.com> writes:

In article <c85srd$268c$1 digitaldaemon.com>, Ben Hinkle says...
Does float4 have value or reference semantics?
I don't think I'd use a float4 myself since I'm not a game programmer but it
has repeatedly come up about using (shortish) arrays with value semantics.
Some generic "static array with value-semantics" would be cool. Right now
to get something like it I'm using a struct with the type and length as
template parameters. It works fine but is verbose.


Sorry, I'm not explaining myself properly.

float4 would have value semantics and would represent a vector hardware register
(e.g. an SSE register in X86.) and use SIMD instructions to perform
add/sub/mul/div/etc... It would be subject to all of the optimizations that the
compiler can currently do on floats and ints.

Modern C++ compilers support the use of these registers/instructions through
intrinsics (or Dylan Cuthbert's extensions to GCC on PS2) and unless D can at
least match them, then as a professional games programmer I'll never be able to
justify the use of D - even in spite of all the other great language features :(

But I'd actually like to see D go one step further by supporting these types in
the language and leapfrogging C++ in an important way in the process.

I'm only interested in extreme performance, so making a struct that looks like a
Cg type is pointless, although interesting :)

Si

May 16 2004

hellcatv hotmail.com writes:

ideally having a struct with vector-like math ops should vectorize
but I don't think any compiler right now we have matches that definition of
ideal

in some ways it would be best to concentrate on figuring out how to enable fast
optimizations on structures that happen to do vector ops rather than programming
specific types for architectures that have SIMD instructions that may go away in
the very next gen of hardware (what if they have scalar units instead next time
around)

I would be curious what the hit would be of using my Cg struct as opposed to
doing the raw math on 3 local vars...I suspect it's a lot... in C++ it certainly
is with gcc...visual studio makes it about a 50% speed hit, but gcc it's more
like 75% speed hit

In article <c87bp5$186f$1 digitaldaemon.com>, Simon Hobbs says...
In article <c85srd$268c$1 digitaldaemon.com>, Ben Hinkle says...
Does float4 have value or reference semantics?
I don't think I'd use a float4 myself since I'm not a game programmer but it
has repeatedly come up about using (shortish) arrays with value semantics.
Some generic "static array with value-semantics" would be cool. Right now
to get something like it I'm using a struct with the type and length as
template parameters. It works fine but is verbose.


Sorry, I'm not explaining myself properly.

float4 would have value semantics and would represent a vector hardware register
(e.g. an SSE register in X86.) and use SIMD instructions to perform
add/sub/mul/div/etc... It would be subject to all of the optimizations that the
compiler can currently do on floats and ints.

Modern C++ compilers support the use of these registers/instructions through
intrinsics (or Dylan Cuthbert's extensions to GCC on PS2) and unless D can at
least match them, then as a professional games programmer I'll never be able to
justify the use of D - even in spite of all the other great language features :(

But I'd actually like to see D go one step further by supporting these types in
the language and leapfrogging C++ in an important way in the process.

I'm only interested in extreme performance, so making a struct that looks like a
Cg type is pointless, although interesting :)

Si

May 16 2004

Simon Hobbs <Simon_member pathlink.com> writes:

In article <c87g2f$1ea5$1 digitaldaemon.com>, hellcatv hotmail.com says...
ideally having a struct with vector-like math ops should vectorize
but I don't think any compiler right now we have matches that definition of
ideal

in some ways it would be best to concentrate on figuring out how to enable fast
optimizations on structures that happen to do vector ops rather than programming
specific types for architectures that have SIMD instructions that may go away in
the very next gen of hardware (what if they have scalar units instead next time
around)

Well, in the future when X86 and PowerPC 'go away' it would still be trivial for
the compiler to implement a vector add using scalar units. As you point out, it
is doing the opposite that proves problematic.

Si

May 16 2004

Ben Hinkle <bhinkle4 juno.com> writes:

Simon Hobbs wrote:

 In article <c85srd$268c$1 digitaldaemon.com>, Ben Hinkle says...
Does float4 have value or reference semantics?
I don't think I'd use a float4 myself since I'm not a game programmer but
it has repeatedly come up about using (shortish) arrays with value
semantics. Some generic "static array with value-semantics" would be cool.
Right now to get something like it I'm using a struct with the type and
length as template parameters. It works fine but is verbose.

 
 
 Sorry, I'm not explaining myself properly.
 
 float4 would have value semantics and would represent a vector hardware
 register (e.g. an SSE register in X86.) and use SIMD instructions to
 perform add/sub/mul/div/etc... It would be subject to all of the
 optimizations that the compiler can currently do on floats and ints.

OK. My first thought was to use the inline assembler but now that you say it
uses SSE registers I guess even with asm blocks you'd have to make sure the
right registers are filled when you call add/sub/etc. That would mess up
the data-flow optimizations. Still it is an option. It is kindof like
bringing back the "register" storage attribute from C (shudder).

 Modern C++ compilers support the use of these registers/instructions
 through intrinsics (or Dylan Cuthbert's extensions to GCC on PS2) and
 unless D can at least match them, then as a professional games programmer
 I'll never be able to justify the use of D - even in spite of all the
 other great language features :(

If these GCC extensions work on the x86 then gdc could pick them up. DMD
would take longer though.

 But I'd actually like to see D go one step further by supporting these
 types in the language and leapfrogging C++ in an important way in the
 process.
 
 I'm only interested in extreme performance, so making a struct that looks
 like a Cg type is pointless, although interesting :)

D is young so the performance will certainly improve somewhat, maybe not to
the extreme you are looking for. When I read about D it struck me as a
slightly lower level version of Java/Csharp. I never expected it to have
extreme performance of something like Fortran or Cg or even all the
customizability of C++. So for me that's all bonus :-)
Still some vectorized support could benefit both the game and scientific
computing worlds.

I was just googling to see if anyone has tried putting BLAS on the GPU and
sure enough people are looking into it. See for example 
  http://wwwcg.in.tum.de/Research/data/Publications/sig03.pdf
That would mean numerical algorithms run on the GPU instead of the CPU for
the vectorized ops. There are probably tons of problems with getting the
data there and back but it's a neat possibility. Who says a graphics card
is just for graphics? ;-)

May 16 2004

hellcatv hotmail.com writes:

Actually My research project involves doing things like BLAS on the GPU
http://graphics.stanford.edu/projects/brookgpu/

in fact we benchmarked a lot of the blas stuff...
(the code for the benchmarks is available on the above website and in CVS)
on the matrix-vector operations the performance was quite impressive (SAXPY and
Dot)
however matrix-matrix multiply sucks on the GPU... we get the full bandwidth out
of the cache, but--full bandwidth out of the cache is half or a quarter the full
bandwidth out of the CPU cache... so there's no chance you win on matrix-matrix.

anyhow feel free to download our brook platform and try writing some GPU
programs yourself... (I recommend getting the CVS version right now--the
released version is falling behind in features)
and feel free to chat with me about what kinds of apps will work well on the
GPU... the answer is apps that reuse their data a finite number of times... 
things that get huge cache performance on the CPU are not likely candidates.
--Daniel


In article <c87rg0$1ugl$1 digitaldaemon.com>, Ben Hinkle says...
Simon Hobbs wrote:

 In article <c85srd$268c$1 digitaldaemon.com>, Ben Hinkle says...
Does float4 have value or reference semantics?
I don't think I'd use a float4 myself since I'm not a game programmer but
it has repeatedly come up about using (shortish) arrays with value
semantics. Some generic "static array with value-semantics" would be cool.
Right now to get something like it I'm using a struct with the type and
length as template parameters. It works fine but is verbose.

 
 
 Sorry, I'm not explaining myself properly.
 
 float4 would have value semantics and would represent a vector hardware
 register (e.g. an SSE register in X86.) and use SIMD instructions to
 perform add/sub/mul/div/etc... It would be subject to all of the
 optimizations that the compiler can currently do on floats and ints.

OK. My first thought was to use the inline assembler but now that you say it
uses SSE registers I guess even with asm blocks you'd have to make sure the
right registers are filled when you call add/sub/etc. That would mess up
the data-flow optimizations. Still it is an option. It is kindof like
bringing back the "register" storage attribute from C (shudder).

 Modern C++ compilers support the use of these registers/instructions
 through intrinsics (or Dylan Cuthbert's extensions to GCC on PS2) and
 unless D can at least match them, then as a professional games programmer
 I'll never be able to justify the use of D - even in spite of all the
 other great language features :(

If these GCC extensions work on the x86 then gdc could pick them up. DMD
would take longer though.

 But I'd actually like to see D go one step further by supporting these
 types in the language and leapfrogging C++ in an important way in the
 process.
 
 I'm only interested in extreme performance, so making a struct that looks
 like a Cg type is pointless, although interesting :)

D is young so the performance will certainly improve somewhat, maybe not to
the extreme you are looking for. When I read about D it struck me as a
slightly lower level version of Java/Csharp. I never expected it to have
extreme performance of something like Fortran or Cg or even all the
customizability of C++. So for me that's all bonus :-)
Still some vectorized support could benefit both the game and scientific
computing worlds.

I was just googling to see if anyone has tried putting BLAS on the GPU and
sure enough people are looking into it. See for example 
  http://wwwcg.in.tum.de/Research/data/Publications/sig03.pdf
That would mean numerical algorithms run on the GPU instead of the CPU for
the vectorized ops. There are probably tons of problems with getting the
data there and back but it's a neat possibility. Who says a graphics card
is just for graphics? ;-)

May 16 2004

hellcatv hotmail.com writes:

I have implemented Cg's float2 float3 and float4 classes in D
they're exactly like Cg

I suggest people just use the standard pioneered by Microsoft and Nvidia for the
vector format :-)
if we can build it into the compiler, great

download it here (it's GPL right now, but as the author I'm willing to relicense
it at your request)
http://cvs.sourceforge.net/viewcvs.py/deliria/deliria/vec.d

http://cvs.sourceforge.net/viewcvs.py/deliria/deliria/matrix.d


In article <c851kq$v01$1 digitaldaemon.com>, Simon Hobbs says...
If you want D to gain a great advantage (at least for games) over modern c++
compilers  I believe it would be a smart move to add a built in type for float4
(and I guess double2 for completeness.)

CPU support for these types is getting to be pretty ubiquitous and making them
built-in has all the advantages of c++ intrinsics and far more besides:

1. consistency across implementations

2. native support for constants

3. debug and release code won't have the huge (order of magnitude) speed
disparity that they do in the c++ method. In c++ it is really a pre-requisite to
make a class wrapper for the intrinsic functions because they are entirely
un-usable in their native form. This works fine but a vector add, for example,
calls a 12 instruction function in a debug build and in a release build is a
single vector instruction. Trying to debug a 60fps game at 10fps is a royal pain
in the arse.

4. the possibility of adding built-in support for vector swizzling/write
masking/element access (a la Cg/HLSL) although I guess this is rather
contencious

I would be inclined make dot3, dot4, cross, etc. into intrinsic functions rather
than trying to invent dodgy operators for them.

Another issue that arises is the ability to keep temporary single scalar results
in a vector register so that they don't keep being transferred to and from FPU
registers. Would the optimizer be able to factor this problem away, or would an
explicit float1 (or whatever) type be better?

Si

May 15 2004

hellcatv hotmail.com writes:

First of all I'd like to say that if walter wishes to integrate my vec.d into
his language he can have it under the BSD license or another license if he wants
to talk to me about it. Other users must talk to me about changing the license
but I'm quite flexible.

I was a bit brief about how my float2 float3 and float4 work
in http://cvs.sourceforge.net/viewcvs.py/deliria/deliria/vec.d

but it's almost exactly like the Cg spec except for a few caveats
a) as you can see in my posts before I was complaining that the opCmp operator
must only return an int hence I can't do the 4-way < and > and ==
comparisons...so I use the dot product to get a partial ordering
b) you can still do component-wise compares using the opLess and opGreater and
opLEqual and so forth.
c) you can assign using the swizzle operators
float4 myvar=float4(1,2,3,4);
float3 mytmp.yzx = myvar.zyx;
..
d) you cannot repeat letters in the swizzle operators unless you are on gdc
because digital mars' link.exe has a bug that crashes if too many functions are
defined in a single file
float3 mytmp.xyz = myvar.zyz; <-- only works if you define Swizzle as a compiler
flag
otherwise the alternative is
float3 mytmp.xyz = myvar.swizzle(2,1,2);
it's just as powerful a syntax and only necessary if you repeat components.

e) I have provided real2 real3 and real4 and double2 double3 and double4
vectors.

f) I welcome contributions to the lib... and especially benchmarking of it.
g) I have provided all the intrinsic functions within Cg (cos, lerp, etc)--they
also work on the intrinsic float,double and real types.
h) this lib really pushes the digital mars compiler to its limit--adding one or
two functions causes the linker to crash under windows.
of course gdc is golden
i) The lib is created using a single template class and a *lot* of
instantiations of that class (so the user mustn't type vec!(real,4)  of course
that syntax works as well)
--Daniel


In article <c86boa$2r88$1 digitaldaemon.com>, hellcatv hotmail.com says...
I have implemented Cg's float2 float3 and float4 classes in D
they're exactly like Cg

I suggest people just use the standard pioneered by Microsoft and Nvidia for the
vector format :-)
if we can build it into the compiler, great

download it here (it's GPL right now, but as the author I'm willing to relicense
it at your request)
http://cvs.sourceforge.net/viewcvs.py/deliria/deliria/vec.d

http://cvs.sourceforge.net/viewcvs.py/deliria/deliria/matrix.d


In article <c851kq$v01$1 digitaldaemon.com>, Simon Hobbs says...
If you want D to gain a great advantage (at least for games) over modern c++
compilers  I believe it would be a smart move to add a built in type for float4
(and I guess double2 for completeness.)

CPU support for these types is getting to be pretty ubiquitous and making them
built-in has all the advantages of c++ intrinsics and far more besides:

1. consistency across implementations

2. native support for constants

3. debug and release code won't have the huge (order of magnitude) speed
disparity that they do in the c++ method. In c++ it is really a pre-requisite to
make a class wrapper for the intrinsic functions because they are entirely
un-usable in their native form. This works fine but a vector add, for example,
calls a 12 instruction function in a debug build and in a release build is a
single vector instruction. Trying to debug a 60fps game at 10fps is a royal pain
in the arse.

4. the possibility of adding built-in support for vector swizzling/write
masking/element access (a la Cg/HLSL) although I guess this is rather
contencious

I would be inclined make dot3, dot4, cross, etc. into intrinsic functions rather
than trying to invent dodgy operators for them.

Another issue that arises is the ability to keep temporary single scalar results
in a vector register so that they don't keep being transferred to and from FPU
registers. Would the optimizer be able to factor this problem away, or would an
explicit float1 (or whatever) type be better?

Si

May 15 2004

Ben Hinkle <bhinkle4 juno.com> writes:

hellcatv hotmail.com wrote:

 First of all I'd like to say that if walter wishes to integrate my vec.d
 into his language he can have it under the BSD license or another license
 if he wants to talk to me about it. Other users must talk to me about
 changing the license but I'm quite flexible.
 
 I was a bit brief about how my float2 float3 and float4 work
 in http://cvs.sourceforge.net/viewcvs.py/deliria/deliria/vec.d

very nifty. I feel your pain implementing all those swizzle operators. It
feels like lisp again with cdr, cadr, cadadr etc etc :-) How many are there
- it looks like >50. yikes.

But what's with the pretty wacky idea that for nsize < 3 the z() property
should return x()? Does Cg really do that? That seems pretty pervasive that
asking for higher dimension information just picks some valid dimension and
uses that. Seems random to me. But then again maybe there's a reason.

Otherwise it is very cool to have vectorized math operations and such. I
definetely wouldn't mind seeing a simplified version of vec getting
included in phobos somewhere. All those swizzles make my head ... well...
spin.
What I've been using is just:

// helper to make an array "literal" with value semantics
// Example: 
//   uintn!(3)(100,200,300)
struct InlineArray(T,int N) {
  T[N] array;

  static .InlineArray!(T,N) opCall(T x0,...) {
    .InlineArray!(T,N) res;
    res.array[] = (&x0)[0..N][];
    return res;
  }

  static .InlineArray!(T,N) opCall(T[N] x) {
    .InlineArray!(T,N) res;
    res.array[] = x[];
    return res;
  }

  T opIndex(int i) {
    return array[i];
  }

  void opIndex(int i, T val) {
    array[i] = val;
  }

  // todo: arithmetic, cmp, etc
}
template uintn(int N) {
  alias InlineArray!(uint,N) uintn;
}
template intn(int N) {
  alias InlineArray!(int,N) intn;
}
template floatn(int N) {
  alias InlineArray!(float,N) floatn;
}
template doublen(int N) {
  alias InlineArray!(double,N) doublen;
}

May 15 2004

J Anderson <REMOVEanderson badmama.com.au> writes:

hellcatv hotmail.com wrote:

First of all I'd like to say that if walter wishes to integrate my vec.d into
his language he can have it under the BSD license or another license if he wants
to talk to me about it. Other users must talk to me about changing the license
but I'm quite flexible.

I was a bit brief about how my float2 float3 and float4 work
in http://cvs.sourceforge.net/viewcvs.py/deliria/deliria/vec.d

but it's almost exactly like the Cg spec except for a few caveats
a) as you can see in my posts before I was complaining that the opCmp operator
must only return an int hence I can't do the 4-way < and > and ==
comparisons...so I use the dot product to get a partial ordering
b) you can still do component-wise compares using the opLess and opGreater and
opLEqual and so forth.
c) you can assign using the swizzle operators
float4 myvar=float4(1,2,3,4);
float3 mytmp.yzx = myvar.zyx;
..
d) you cannot repeat letters in the swizzle operators unless you are on gdc
because digital mars' link.exe has a bug that crashes if too many functions are
defined in a single file
float3 mytmp.xyz = myvar.zyz; <-- only works if you define Swizzle as a compiler
flag
otherwise the alternative is
float3 mytmp.xyz = myvar.swizzle(2,1,2);
it's just as powerful a syntax and only necessary if you repeat components.

e) I have provided real2 real3 and real4 and double2 double3 and double4
vectors.

f) I welcome contributions to the lib... and especially benchmarking of it.
g) I have provided all the intrinsic functions within Cg (cos, lerp, etc)--they
also work on the intrinsic float,double and real types.
h) this lib really pushes the digital mars compiler to its limit--adding one or
two functions causes the linker to crash under windows.
of course gdc is golden
i) The lib is created using a single template class and a *lot* of
instantiations of that class (so the user mustn't type vec!(real,4)  of course
that syntax works as well)
--Daniel
  

Nice.  Parhaps you should check out / take some ideas from burtons 
math.d class (in undig).  It seemed pretty complete and had some niffty 
ideas.

-- 
-Anderson: http://badmama.com.au/~anderson/

May 15 2004

D Programming

C/C++ Programming

Other

digitalmars.D - Built-in vector types