digitalmars.D - Small Vectors Proposal
- Mikola Lysenko (161/161) Jan 30 2007 Coordinate geometry is a fundamental part of mathematics. Many
- Oskar Linde (25/36) Jan 30 2007 I think the arguing makes much sense, but as the proposal adds 57(!) new...
- Frits van Bommel (4/51) Jan 30 2007 Something like this may unfortunately be necessary if backwards
- Mikola Lysenko (9/20) Jan 30 2007 After thinking about it some more, I believe this approach would
- Benji Smith (14/24) Jan 30 2007 Three and four dimensional vectors are nice for physics and whatnot, but...
- Mikola Lysenko (35/39) Jan 30 2007 I can understand your desire to create a universal solution, and for a
- janderson (17/65) Feb 01 2007 I don't think its critical that high order matrix multiplications get
- Chad J (15/88) Feb 01 2007 I think we should have array operations for dynamic arrays
- Benji Smith (12/19) Feb 07 2007 The more I think about it, the more I agree.
- Don Clugston (8/21) Feb 01 2007 I completely agree. For large vectors and matrices, you're going to have...
- Bill Baxter (10/33) Feb 01 2007 I'd say that's not necessarily true with games using more and more
- Bill Baxter (8/25) Jan 30 2007 Yes I agree with Oskar and Walter that all those 57 primitive type names...
- Frits van Bommel (9/21) Jan 30 2007 [snip]
- Mikola Lysenko (8/35) Jan 30 2007 Hmm... Fair enough. We could define ordering on a per component level
- Frits van Bommel (18/28) Jan 30 2007 Your code is wrong when e.g. (x > a) and (y < b). It returns false true
- Mikola Lysenko (4/36) Jan 30 2007 Whoops! I wasn't really thinking too carefully when I wrote that.
- Frits van Bommel (5/8) Jan 30 2007 It also fits really well with the "plain static array" proposed change:
- Walter Bright (8/8) Jan 30 2007 Thank you, this is in line with what I thought you meant. I'm glad I was...
- Mikola Lysenko (7/20) Jan 30 2007 Yes, Oskar made a good point in his post. The main issue I was
- Andrei Alexandrescu (See Website For Email) (9/22) Jan 30 2007 I don't think this is advisable. It practically introduces an arbitrary
- Walter Bright (4/23) Jan 30 2007 I agree that it's a very bad idea to have such a discontinuity in ref vs...
- Andrei Alexandrescu (See Website For Email) (4/28) Jan 30 2007 So then arrays by ref and a stdlib template that wraps an array in a
- Oskar Linde (19/47) Feb 01 2007 I disagree. What you describe is a way of working around a design issue
- Andrei Alexandrescu (See Website For Email) (7/23) Feb 01 2007 But then people will complain about the inconsistency between static
- Frits van Bommel (9/15) Feb 01 2007 I don't see much complaints about the "inconsistency" between structs
- Lionello Lunesu (14/36) Jan 31 2007 That difference could be made using different notations:
- janderson (17/30) Jan 31 2007 I think some of the properties could be used with multiple dimensions.
- Andrei Alexandrescu (See Website For Email) (45/180) Jan 30 2007 Having read the proposal, I think it can likely be accommodated entirely...
- Bill Baxter (8/21) Jan 30 2007 Your responses to the other points seem reasonable and encouraging, but
- Andrei Alexandrescu (See Website for Email) (21/43) Jan 30 2007 Nonono. I just meant: the fact that today's architectures have
- Deewiant (5/18) Feb 01 2007 I presume the original proposal would have allowed both cast(float2)(int...
- Andrei Alexandrescu (See Website For Email) (9/28) Feb 01 2007 It is a change that I personally think should be made. Walter devised
- Walter Bright (5/10) Feb 01 2007 Yes, it's the "when we don't know what the semantics of this combination...
- Andrei Alexandrescu (See Website For Email) (4/15) Feb 01 2007 That's how D started without templates and ended up with having one of
- Chad J (26/26) Jan 30 2007 Another thing I realized it would be nice to have is some intrinsic
- Knud Soerensen (31/31) Feb 02 2007 I think it is better to make very general framework for describing the
- janderson (19/56) Feb 02 2007 I think this is a special case. Typically I'd say the programmer would
- Mikola Lysenko (56/72) Feb 03 2007 Well, I can't say that I share your enthusiasm for the power of
- janderson (9/44) Feb 03 2007 I couldn't agree more with this last part. However I must say that for
- Knud Soerensen (23/107) Feb 03 2007 I agree with you on this.
Coordinate geometry is a fundamental part of mathematics. Many important problems in computer science are easily expressed in terms of vectors and products. There are many domains which could benefit from enhanced vector performance, especially in scientific and graphical applications. Despite their importance, very few languages provide support for vector types. The prevailing notion is that it should be up to the compiler to optimize and somehow automatically determine where SIMD code can be applied. The problem with this approach is that it is often very difficult for the compiler to spot where a SIMD operation can be applied. Code must be carefully tweaked to make the compiler recognize the correct possible optimization. Even then, many programmers still write their own low-dimensional vector classes for convenience, and perform all SIMD optimizations by hand in assembler. It is important to make the distinction between low dimension vectors and other higher order arrays. This is crucial since the higher dimensional arrays are often more tenuously connected with any sort of geometric or physical interpretation. Moreover, many architectures are specially optimized for the lower dimensional cases, and offer special registers which are virtually impossible to properly exploit using libraries alone. The situation is analogous to floating point arithmetic, proper hardware support requires language level integration. Shader languages (Cg/GLSL) already provide extremely robust vector arithmetic support, and give some direction to this proposal. Here are the proposed language extensions: 1. Additional primitive types All numerical primitive types in the language will be supplemented with 3 additional vector extensions. Each vector extension will contain 2, 3 or 4 components since these are best supported by hardware, and the most relevant geometrically. The initial value for each component in each type will be the same as that of its base type. Each the size of each vector component will be equal to the size of its base type times its dimension. Here is a complete list of the new types to be added: Integer types: byte2, byte3, byte4, short2, short3, short4, int2, int3, int4, long2, long3, long4, cent2, cent3, cent4, ubyte2, ubyte3, ubyte4, ushort2, ushort3, ushort4, uint2, uint3, uint4, ulong2, ulong3, ulong4, ucent2, ucent3, ucent4 Floating types: float2, float3, float4, double2, double3, double4, real2, real3, real4 Imaginary types: ifloat2, ifloat3, ifloat4, idouble2, idouble3, idouble4, ireal2, ireal3, ireal4 Complex types: cfloat2, cfloat3, cfloat4, cdouble2, cdouble3, cdouble4, creal2, creal3, creal4 All new vector types are pass-by-value, and may be used in combination with any number of other types. The following expressions would all be valid: int2[] a; float4* b; char[][ucent4] c; 2. Vector literals It must be possible to declare a vector literal with unambiguously as an argument to a function, or any number of other situations. This syntax must be concise since it will be used often, and it must be flexible, allowing vectors to be constructed via concatenation. In this light, the following is proposed: float4(1, 2, 3, 4); int2(100, 200); creal3(1 + 2i, 3+4i, 5-6i); Additionally, this syntax will also support the construction of vectors from other sub vectors. This can be useful for augmenting a 3-vector with a fourth homogenous component for transformation amongst other things. Here are some examples: float4(float3(x, y, z), 1); // returns (x, y, z, 1) int3(0, int2(a, b)); // returns (0, a, b) creal4(ireal2(r, z), real2(p, q)); // returns (r, z, p, q) 3. Casting rules Vector types will obey the same casting rules as their base type within a single dimension. It is not possible to use a cast to change the dimension of a vector. cast(float2)(int2(1, 2)); //ok cast(int)(float2(3, 4)); //error, int is not the same dimension as float2 real3(a, b) + ireal(c, d); //ok, type of expression is creal3 4. Component-Wise Arithmetic All of the arithmetic operators on primitive types will be extended such that when applied to vectors they operate on each component independently. The size and types of the vectors must be compatible with the operator. float4(1, 2, 3, 4) + float4(3, 5, 6, 7); // Vector-Vector operation, returns (4, 7, 9, 11) float3(1, 5, 6) * float3(1, 0, 2); // returns (1, 0, 12) uint2(0xFF00FF, 0x00FF88) & uint2(0xFF, 0xFF); // returns (0xFF, 0x88) int3(x, y, z) * int2(a, b); // error, size of vectors does not match Additionally, scalar vector operations will be supported. Each scalar-vector operator is applied per-component within the vector. int2(0, 3) * 5; // Vector-Scalar operation, returns (0, 15) byte2(2, 5) + 6; // returns (8, 11) Note that the only two comparison operators allowed on vectors are == and !=. The problem with <, >, etc. is that there is no commonly agreed upon definition for such things. Therefore the operators are omitted to avoid confusion. int2(1, 2) == int2(1, 2); //ok, evaluates to true float3(1, 0, 0) < float3(0, 0, 1); // error, < is undefined for vectors 5. Swizzle operations In many vector applications, it is often necessary to reorder the components. In shader languages this is accomplished via 'swizzle' properties in each vector type. Each swizzle selects a set of 1-4 vector components and places them into a new vector of the given dimension. Conventionally these components have the following names: x - first component y - second component z - third component w - fourth component A swizzle is then given as a property consisting of 1-4 components in a given order. Here are some examples: float2(a, b).yx; // evaluates to (b, a) float3(1, 2, 3).zzyx; // evaluates to (3, 3, 2, 1) byte4('a', 'b', 'c', 'd').w; // evaluates to 'd' creal3(1 + 2i, 0, 4i).zzzz; // evaluates to (4i, 4i, 4i, 4i) int2(1, 2).z; // error, vector does not have 3 components These can be useful when projecting homogenous vectors into a lower dimension subspace, or when computing various products. Here is an example of a cross product implemented using swizzles: float3 cross(float3 a, float3 b) { return a.zxy * b.yzx - a.yzx * b.zxy; } And here is a simple homogenous projection: float3 project(float4 hg) { return hg.xyz / hg.w; } 6. Standard Library Additions The basic vector type additions have been kept as spartan as possible to avoid cluttering the basic language, while still allowing the maximum expressiveness. Instead of defining various vector products as properties, they are implemented within a standard library extension. The reason for this is simple; many users may wish to redefine these products based on their own coordinate system. Within the standard library, all vector products are performed in right-handed euclidean space. Vector specific functions will be placed in a new library std.vmath, and all functions in std.math will be extended to work on vectors as well primitive types. Here is a list of functions std.vmath will support along with a brief description: dot(a, b) - Computes dot product of two vectors: a.x * b.x + a.y * b.y + a.z * b.z + a.w * b.w cross(a, b) - Computes the cross product as defined above. Only defined for 3d vectors. perp(a, b) - Computes the perp product of two vectors as (a.x * b.y - a.y * b.x). Only defined for 2d vectors. mag(a) - Computes the magnitude of a vector, given as: sqrt(x * x + y * y + z * z + w * w); mag2(a) - Compute the square of the magnitude of a vector. norm(a) - Returns a / mag(a) lerp(a, b, t) - Compute an interpolated position between two vectors. a * (1 - t) + b * t Additionally, it would be convenient to have quaternion functions supported in the standard library. A quaternion q would be a 4 vector with the following form: q = q.w + q.x * i + q.y * j + q.z * k qconj(a) - Computes the quaternion conjugate of a. Given as float4(-a.xyz, a.w) qmul(a, b) - Computes the quaternion product of two 4d vectors. qrot(a, b) - Performs a quaternion rotation/dilation on the 3d vector b by the 4d quaternion a. Finally, there are range manipulation functions: clamp(a, min, max) - Clamps each component of a to the range [min, max] minv(a) - Returns the smallest component in the vector a maxv(a) - Returns the largest component in the vector a
Jan 30 2007
Mikola Lysenko wrote: [snip]1. Additional primitive types byte2, byte3, byte4, short2, short3, short4, int2, int3, int4, long2, long3, long4, cent2, cent3, cent4, ubyte2, ubyte3, ubyte4, ushort2, ushort3, ushort4, uint2, uint3, uint4, ulong2, ulong3, ulong4, ucent2, ucent3, ucent4 float2, float3, float4, double2, double3, double4, real2, real3, real4 ifloat2, ifloat3, ifloat4, idouble2, idouble3, idouble4, ireal2, ireal3, ireal4 cfloat2, cfloat3, cfloat4, cdouble2, cdouble3, cdouble4, creal2, creal3, creal4I think the arguing makes much sense, but as the proposal adds 57(!) new domain specific primitive types to the D language specification, I believe it is way too heavy. For discussion, here is a not very thought out counter proposal: 1. Make the D 2.0 static array pass-by-value instead of it's inconsistent and strange pseudo-reference/value type heritage from C. (Pass by reference could be retained for extern(C) functions only). 2. Define ABIs for SSE2, SSE3, etc platforms so that small static arrays can be allocated and passed in SIMD registers. 3. Implement vector operations for static arrays. Advantages: * The code would automatically be platform independent. On a non SIMD-able platform, float[4], byte[8], and so on would all be well defined and work identically to the way they do on SIMD platforms. * Future platforms would not require changes to the language specification. float[8] could suddenly become SIMD-able for instance. * Minor changes to the language specification. Mostly ABI changes. A much worse, but less radical proposal, would be to revive the C "register" keyword. Instead of: float4 x; you get: register float[4] x; /Oskar
Jan 30 2007
Oskar Linde wrote:Mikola Lysenko wrote: [snip]I must say, I like this version better.1. Additional primitive types byte2, byte3, byte4, short2, short3, short4, int2, int3, int4, long2, long3, long4, cent2, cent3, cent4, ubyte2, ubyte3, ubyte4, ushort2, ushort3, ushort4, uint2, uint3, uint4, ulong2, ulong3, ulong4, ucent2, ucent3, ucent4 float2, float3, float4, double2, double3, double4, real2, real3, real4 ifloat2, ifloat3, ifloat4, idouble2, idouble3, idouble4, ireal2, ireal3, ireal4 cfloat2, cfloat3, cfloat4, cdouble2, cdouble3, cdouble4, creal2, creal3, creal4I think the arguing makes much sense, but as the proposal adds 57(!) new domain specific primitive types to the D language specification, I believe it is way too heavy. For discussion, here is a not very thought out counter proposal: 1. Make the D 2.0 static array pass-by-value instead of it's inconsistent and strange pseudo-reference/value type heritage from C. (Pass by reference could be retained for extern(C) functions only). 2. Define ABIs for SSE2, SSE3, etc platforms so that small static arrays can be allocated and passed in SIMD registers. 3. Implement vector operations for static arrays. Advantages: * The code would automatically be platform independent. On a non SIMD-able platform, float[4], byte[8], and so on would all be well defined and work identically to the way they do on SIMD platforms. * Future platforms would not require changes to the language specification. float[8] could suddenly become SIMD-able for instance. * Minor changes to the language specification. Mostly ABI changes.A much worse, but less radical proposal, would be to revive the C "register" keyword. Instead of: float4 x; you get: register float[4] x;Something like this may unfortunately be necessary if backwards compatibility is to be maintained with pass-by-reference code. :(
Jan 30 2007
Oskar Linde wrote:For discussion, here is a not very thought out counter proposal: 1. Make the D 2.0 static array pass-by-value instead of it's inconsistent and strange pseudo-reference/value type heritage from C. (Pass by reference could be retained for extern(C) functions only). 2. Define ABIs for SSE2, SSE3, etc platforms so that small static arrays can be allocated and passed in SIMD registers. 3. Implement vector operations for static arrays.After thinking about it some more, I believe this approach would probably work just as well. Another possibility is to simply restrict vectors to floating point types only: ie, float2, float3, float4, double2, double3, double4, real2, real3, real4. This makes the number of new types only 9 instead of 57. The disadvantage is that it is no longer easy to use the MMX extensions which are specially tailored for integer vectors. Overall, I guess I'd be happy with either situation.
Jan 30 2007
Mikola Lysenko wrote:After thinking about it some more, I believe this approach would probably work just as well. Another possibility is to simply restrict vectors to floating point types only: ie, float2, float3, float4, double2, double3, double4, real2, real3, real4. This makes the number of new types only 9 instead of 57. The disadvantage is that it is no longer easy to use the MMX extensions which are specially tailored for integer vectors. Overall, I guess I'd be happy with either situation.Three and four dimensional vectors are nice for physics and whatnot, but for those of us in the machine-learning field, it's commonplace to construct vectors (and to calculate distances, dot-products, and cosines between those vectors) using many many more dimensions. Right now, I'm working on a statistical prediction algorithm that uses sparse 300,000-dimensional feature vectors. There's no reason to limit SSE optimizations to small-dimensionality vectors. I don't think having special datatypes is the right solution. I think it should be possible to annotate any array with a flag that says "I'll be doing vector math here. Watch out for possible optimizations". Or just having a library of vector functions would probably be sufficient to solve the problem in the majority of cases. --benji
Jan 30 2007
Benji Smith wrote:Right now, I'm working on a statistical prediction algorithm that uses *sparse* 300,000-dimensional feature vectors. There's no reason to limit SSE optimizations to small-dimensionality vectors.I can understand your desire to create a universal solution, and for a long time I too was convinced this was the way to go. However I have come to understand that there are many practical differences between low and high dimension vectors. Think about it this way: would you try to use a sparse matrix/vector library for doing operations on 4-d vectors and matrices? It should be clear that these are two different very different types of problems. If we tried to use pass-by-value semantics and array operations on gigantic vectors the result would be equally bad. Obviously passing a 300k+ element array around by value is going to destroy your performance. The classic solution is to create a series of custom routines for whichever problem you are trying to solve, or perhaps use a robust optimized library like BLAS for the heavy lifting. High dimension linear algebra is a tricky subject, and it can take years of research to get algorithms with competitive performance. On the other hand, Low dimension vectors require relatively trivial compiler support, and are used far more often. To get a rough estimate of their relative popularity, I invoke Google CodeSearch: vector math -sparse : 282k hits http://www.google.com/codesearch?q=vector+math+-sparse&hl=en&btnG=Search+Code vector sparse : 31k hits http://www.google.com/codesearch?hl=en&lr=&q=vector+sparse&btnG=Search From this informal statistic, we could infer that the low dimensional vectors are used at least an order of magnitude more frequently than their higher order counterparts. I do not mean to say that fast sparse vector/matrix operations are unimportant, after all a fast matrix solver can sometimes shave weeks off your processing time. In such a situation, it is permissible for a library to be slightly cumbersome - just as long as it gets the job done fast. Compare this to small vectors, where many applications place convenience and readability above performance. Integrating small vectors into the core language will not only improve their usability, but it should also provide substantial performance gains.
Jan 30 2007
Mikola Lysenko wrote:Benji Smith wrote:I don't think its critical that high order matrix multiplications get the same performance gains as smaller arrays that use the SIMD. Its just important that its consistent. Putting vector operation into an array gives us more power because we get all the standard array operations as well (I concede that these operations could be put into the base arrays as well). I hope that eventually that dynamic arrays would get the SIMD support as well. That way you could for instance load an entire list of vertexes (or whatever) and apply an operation on all of them. D could even do this in parallel. (I concede here that you could probably do that with an array of these base types. Also vectors with 4 parts can be treated somewhat differently in that the 4th component is sometimes used for other reasons). I generally think that representing vector types as arrays makes the language more powerful, then having individual special case types. -JoelRight now, I'm working on a statistical prediction algorithm that uses *sparse* 300,000-dimensional feature vectors. There's no reason to limit SSE optimizations to small-dimensionality vectors.I can understand your desire to create a universal solution, and for a long time I too was convinced this was the way to go. However I have come to understand that there are many practical differences between low and high dimension vectors. Think about it this way: would you try to use a sparse matrix/vector library for doing operations on 4-d vectors and matrices? It should be clear that these are two different very different types of problems. If we tried to use pass-by-value semantics and array operations on gigantic vectors the result would be equally bad. Obviously passing a 300k+ element array around by value is going to destroy your performance. The classic solution is to create a series of custom routines for whichever problem you are trying to solve, or perhaps use a robust optimized library like BLAS for the heavy lifting. High dimension linear algebra is a tricky subject, and it can take years of research to get algorithms with competitive performance. On the other hand, Low dimension vectors require relatively trivial compiler support, and are used far more often. To get a rough estimate of their relative popularity, I invoke Google CodeSearch: vector math -sparse : 282k hits http://www.google.com/codesearch?q=vector+math+-sparse&hl=en&btnG=Search+Code vector sparse : 31k hits http://www.google.com/codesearch?hl=en&lr=&q=vector+sparse&btnG=Search From this informal statistic, we could infer that the low dimensional vectors are used at least an order of magnitude more frequently than their higher order counterparts. I do not mean to say that fast sparse vector/matrix operations are unimportant, after all a fast matrix solver can sometimes shave weeks off your processing time. In such a situation, it is permissible for a library to be slightly cumbersome - just as long as it gets the job done fast. Compare this to small vectors, where many applications place convenience and readability above performance. Integrating small vectors into the core language will not only improve their usability, but it should also provide substantial performance gains.
Feb 01 2007
janderson wrote:Mikola Lysenko wrote:I think we should have array operations for dynamic arrays (n-dimensional vectors) as well, but distinct from lo-D static arrays. The reason is the same as others state: hi-D arrays can spend some fixed overhead setting things up before jumping into a gigantic loop, whereas lo-D arrays get optimized much differently with no fixed overhead and possibly higher running overhead or less generality. They are two different beasts. Since hi-D arrays can deal with some overhead at startup, they are ideal candidates for a library implementation of array operations. There are only one or two features D would need to get there and have this seamless library implementation (maybe they are done already?? I forget), and that may be an easier and more general route than trying to make compiler writers (Walter) write compiler support for both lo-D and hi-D optimization cases.Benji Smith wrote:I don't think its critical that high order matrix multiplications get the same performance gains as smaller arrays that use the SIMD. Its just important that its consistent. Putting vector operation into an array gives us more power because we get all the standard array operations as well (I concede that these operations could be put into the base arrays as well). I hope that eventually that dynamic arrays would get the SIMD support as well. That way you could for instance load an entire list of vertexes (or whatever) and apply an operation on all of them. D could even do this in parallel. (I concede here that you could probably do that with an array of these base types. Also vectors with 4 parts can be treated somewhat differently in that the 4th component is sometimes used for other reasons). I generally think that representing vector types as arrays makes the language more powerful, then having individual special case types. -JoelRight now, I'm working on a statistical prediction algorithm that uses *sparse* 300,000-dimensional feature vectors. There's no reason to limit SSE optimizations to small-dimensionality vectors.I can understand your desire to create a universal solution, and for a long time I too was convinced this was the way to go. However I have come to understand that there are many practical differences between low and high dimension vectors. Think about it this way: would you try to use a sparse matrix/vector library for doing operations on 4-d vectors and matrices? It should be clear that these are two different very different types of problems. If we tried to use pass-by-value semantics and array operations on gigantic vectors the result would be equally bad. Obviously passing a 300k+ element array around by value is going to destroy your performance. The classic solution is to create a series of custom routines for whichever problem you are trying to solve, or perhaps use a robust optimized library like BLAS for the heavy lifting. High dimension linear algebra is a tricky subject, and it can take years of research to get algorithms with competitive performance. On the other hand, Low dimension vectors require relatively trivial compiler support, and are used far more often. To get a rough estimate of their relative popularity, I invoke Google CodeSearch: vector math -sparse : 282k hits http://www.google.com/codesearch?q=vector+math+-sparse&hl=en&btnG=Search+Code vector sparse : 31k hits http://www.google.com/codesearch?hl=en&lr=&q=vector+sparse&btnG=Search From this informal statistic, we could infer that the low dimensional vectors are used at least an order of magnitude more frequently than their higher order counterparts. I do not mean to say that fast sparse vector/matrix operations are unimportant, after all a fast matrix solver can sometimes shave weeks off your processing time. In such a situation, it is permissible for a library to be slightly cumbersome - just as long as it gets the job done fast. Compare this to small vectors, where many applications place convenience and readability above performance. Integrating small vectors into the core language will not only improve their usability, but it should also provide substantial performance gains.
Feb 01 2007
Chad J wrote:I think we should have array operations for dynamic arrays (n-dimensional vectors) as well, but distinct from lo-D static arrays. The reason is the same as others state: hi-D arrays can spend some fixed overhead setting things up before jumping into a gigantic loop, whereas lo-D arrays get optimized much differently with no fixed overhead and possibly higher running overhead or less generality. They are two different beasts.The more I think about it, the more I agree. Hi-D vectors are almost always sparse, so the feature/magnitude pairs are usually stored in a HashMap (or some other similar container) rather than being represented in arrays. Adding compiler optimizations for hi-D vector operators would mean knowing the underlying implementation of the container, and that'd obviously be bad. But I don't think vector optimizations should be constrained to 3D or 4D vectors. Generalize the optimizations on array types, and then any code that performs vector operations on those arrays automatically gets the performance boost, regardless of the dimensionality of their vectors. --benji
Feb 07 2007
Mikola Lysenko wrote:Benji Smith wrote:I completely agree. For large vectors and matrices, you're going to have to use a loop, and the loop startup costs are amortised; you can use the same BLAS code for a 3159 element vector as for a 3158 element one, and it makes negligible difference to the speed. But for 3 and 4 element vectors it makes an enormous difference. And the applications are completely different. I imagine games programmers don't use much above 4x4 matrices.Right now, I'm working on a statistical prediction algorithm that uses *sparse* 300,000-dimensional feature vectors. There's no reason to limit SSE optimizations to small-dimensionality vectors.I can understand your desire to create a universal solution, and for a long time I too was convinced this was the way to go. However I have come to understand that there are many practical differences between low and high dimension vectors. Think about it this way: would you try to use a sparse matrix/vector library for doing operations on 4-d vectors and matrices? It should be clear that these are two different very different types of problems.
Feb 01 2007
Don Clugston wrote:Mikola Lysenko wrote:I'd say that's not necessarily true with games using more and more realistic physics, and even with run of the mill skeletal animation techniques. But in those cases you're talking arbitrary N-vectors, so in any event the most common dimensions for data are: 2 (screen/texture space), 3 (3D world space, homogeneous 2D space, color space), 4 (homogeneous 3D world space), and N (physics and everything else). --bbBenji Smith wrote:I completely agree. For large vectors and matrices, you're going to have to use a loop, and the loop startup costs are amortised; you can use the same BLAS code for a 3159 element vector as for a 3158 element one, and it makes negligible difference to the speed. But for 3 and 4 element vectors it makes an enormous difference. And the applications are completely different. I imagine games programmers don't use much above 4x4 matrices.Right now, I'm working on a statistical prediction algorithm that uses *sparse* 300,000-dimensional feature vectors. There's no reason to limit SSE optimizations to small-dimensionality vectors.I can understand your desire to create a universal solution, and for a long time I too was convinced this was the way to go. However I have come to understand that there are many practical differences between low and high dimension vectors. Think about it this way: would you try to use a sparse matrix/vector library for doing operations on 4-d vectors and matrices? It should be clear that these are two different very different types of problems.
Feb 01 2007
Oskar Linde wrote:Mikola Lysenko wrote: [snip]Yes I agree with Oskar and Walter that all those 57 primitive type names are not needed in the core language, and instead float[4] should act like your float4. But for users, this kind of alias should behave as expected: alias float[4] float4; And a std.vecmath library could define all 57 of the aliases above. --bb1. Additional primitive types byte2, byte3, byte4, short2, short3, short4, int2, int3, int4, long2, long3, long4, cent2, cent3, cent4, ubyte2, ubyte3, ubyte4, ushort2, ushort3, ushort4, uint2, uint3, uint4, ulong2, ulong3, ulong4, ucent2, ucent3, ucent4 float2, float3, float4, double2, double3, double4, real2, real3, real4 ifloat2, ifloat3, ifloat4, idouble2, idouble3, idouble4, ireal2, ireal3, ireal4 cfloat2, cfloat3, cfloat4, cdouble2, cdouble3, cdouble4, creal2, creal3, creal4I think the arguing makes much sense, but as the proposal adds 57(!) new domain specific primitive types to the D language specification, I believe it is way too heavy.
Jan 30 2007
Mikola Lysenko wrote:All new vector types are pass-by-value, and may be used in combination with any number of other types. The following expressions would all be valid:[snip 2 others]char[][ucent4] c;[snip]Note that the only two comparison operators allowed on vectors are == and !=. The problem with <, >, etc. is that there is no commonly agreed upon definition for such things. Therefore the operators are omitted to avoid confusion. int2(1, 2) == int2(1, 2); //ok, evaluates to true float3(1, 0, 0) < float3(0, 0, 1); // error, < is undefined for vectorsThese two sections are contradictory: associative arrays need an ordering to be defined on the key type. How about a simple lexicographical order, with two non-equal vectors (of the same dimensionality) comparing as their first non-equal component would? (That's how comparisons work for arrays IIRC) Other than that I don't see any obvious problems with it.
Jan 30 2007
Frits van Bommel wrote:Mikola Lysenko wrote:Hmm... Fair enough. We could define ordering on a per component level arbitrarily so associative arrays work. I could see this being confusing in some situations, but it would be worth it in order to make the types consistent. Therefore: float3(x, y, z) < float3(a, b, c) would translate too: (x < a) ? true : ((y < b) ? true : (z < c))All new vector types are pass-by-value, and may be used in combination with any number of other types. The following expressions would all be valid:[snip 2 others]char[][ucent4] c;[snip]Note that the only two comparison operators allowed on vectors are == and !=. The problem with <, >, etc. is that there is no commonly agreed upon definition for such things. Therefore the operators are omitted to avoid confusion. int2(1, 2) == int2(1, 2); //ok, evaluates to true float3(1, 0, 0) < float3(0, 0, 1); // error, < is undefined for vectorsThese two sections are contradictory: associative arrays need an ordering to be defined on the key type. How about a simple lexicographical order, with two non-equal vectors (of the same dimensionality) comparing as their first non-equal component would? (That's how comparisons work for arrays IIRC) Other than that I don't see any obvious problems with it.
Jan 30 2007
Mikola Lysenko wrote:Hmm... Fair enough. We could define ordering on a per component level arbitrarily so associative arrays work. I could see this being confusing in some situations, but it would be worth it in order to make the types consistent. Therefore: float3(x, y, z) < float3(a, b, c) would translate too: (x < a) ? true : ((y < b) ? true : (z < c))Your code is wrong when e.g. (x > a) and (y < b). It returns false true while it should be false; x != a means ((x, y, z) < (a, b, c)) == (x < a). Think of the order of words in a dictionary: first you compare the first character, if it's equal you go on to the second, and so on. The first character that differs determines the sort order. It'd be implemented something like this: --- int compare (short3 a, short3 b) if (a.x != b.x) return a.x - b.x; if (a.y != b.y) return a.y - b.y; if (a.z != b.z) return a.z - b.z; return 0; // all elements equal, so the vectors are equal } --- Note 1: I used shorts to produce clear code but avoid overflows. Note 2: This uses the opCmp/TypeInfo.compare return value convention: negative means a < b, positive means a > b, zero means equality.
Jan 30 2007
Frits van Bommel wrote:Mikola Lysenko wrote:Whoops! I wasn't really thinking too carefully when I wrote that. You're correct, a lexicographic ordering is definitely the simplest in this case.Hmm... Fair enough. We could define ordering on a per component level arbitrarily so associative arrays work. I could see this being confusing in some situations, but it would be worth it in order to make the types consistent. Therefore: float3(x, y, z) < float3(a, b, c) would translate too: (x < a) ? true : ((y < b) ? true : (z < c))Your code is wrong when e.g. (x > a) and (y < b). It returns false true while it should be false; x != a means ((x, y, z) < (a, b, c)) == (x < a). Think of the order of words in a dictionary: first you compare the first character, if it's equal you go on to the second, and so on. The first character that differs determines the sort order. It'd be implemented something like this: --- int compare (short3 a, short3 b) if (a.x != b.x) return a.x - b.x; if (a.y != b.y) return a.y - b.y; if (a.z != b.z) return a.z - b.z; return 0; // all elements equal, so the vectors are equal } --- Note 1: I used shorts to produce clear code but avoid overflows. Note 2: This uses the opCmp/TypeInfo.compare return value convention: negative means a < b, positive means a > b, zero means equality.
Jan 30 2007
Mikola Lysenko wrote:Whoops! I wasn't really thinking too carefully when I wrote that. You're correct, a lexicographic ordering is definitely the simplest in this case.It also fits really well with the "plain static array" proposed change: this is already what arrays do[1] :). [1]: Well, if they're the same length. Otherwise they compare by length, presumably for efficiency.
Jan 30 2007
Thank you, this is in line with what I thought you meant. I'm glad I was right. Your proposal is sensible and well thought out. The only improvement I'd make is instead of: float4 f; Have: float[4] f; i.e. make static arrays of 2,3,4 dimensions have the suggested properties. I think this is quite doable.
Jan 30 2007
Walter Bright wrote:Thank you, this is in line with what I thought you meant. I'm glad I was right. Your proposal is sensible and well thought out. The only improvement I'd make is instead of: float4 f; Have: float[4] f; i.e. make static arrays of 2,3,4 dimensions have the suggested properties. I think this is quite doable.Yes, Oskar made a good point in his post. The main issue I was concerned with was the vector literal syntax. Currently array literals are allocated on the heap when you declare them, which would be prohibitively costly in most situations. In order for low-dimension vectors to be useful, they need a simple constructor syntax that does not allocate anything on the heap.
Jan 30 2007
Walter Bright wrote:Thank you, this is in line with what I thought you meant. I'm glad I was right. Your proposal is sensible and well thought out. The only improvement I'd make is instead of: float4 f; Have: float[4] f; i.e. make static arrays of 2,3,4 dimensions have the suggested properties. I think this is quite doable.I don't think this is advisable. It practically introduces an arbitrary discontinuity. Think of a template of T and N that manipulates arrays T[N]. All of a sudden, for certain values of T and N, arrays started being passed around by value. What exactly makes small vectors unsuitable for an implementation via library elements? If we distill a list, maybe we also can figure ways in which we can make the language better, not only one special case of it. Andrei
Jan 30 2007
Andrei Alexandrescu (See Website For Email) wrote:Walter Bright wrote:I agree that it's a very bad idea to have such a discontinuity in ref vs value. Static arrays would have to be all by value or all by ref. Then, you'll just see slower speed for the larger ones.Thank you, this is in line with what I thought you meant. I'm glad I was right. Your proposal is sensible and well thought out. The only improvement I'd make is instead of: float4 f; Have: float[4] f; i.e. make static arrays of 2,3,4 dimensions have the suggested properties. I think this is quite doable.I don't think this is advisable. It practically introduces an arbitrary discontinuity. Think of a template of T and N that manipulates arrays T[N]. All of a sudden, for certain values of T and N, arrays started being passed around by value.
Jan 30 2007
Walter Bright wrote:Andrei Alexandrescu (See Website For Email) wrote:So then arrays by ref and a stdlib template that wraps an array in a struct could make everyone happy. AndreiWalter Bright wrote:I agree that it's a very bad idea to have such a discontinuity in ref vs value. Static arrays would have to be all by value or all by ref. Then, you'll just see slower speed for the larger ones.Thank you, this is in line with what I thought you meant. I'm glad I was right. Your proposal is sensible and well thought out. The only improvement I'd make is instead of: float4 f; Have: float[4] f; i.e. make static arrays of 2,3,4 dimensions have the suggested properties. I think this is quite doable.I don't think this is advisable. It practically introduces an arbitrary discontinuity. Think of a template of T and N that manipulates arrays T[N]. All of a sudden, for certain values of T and N, arrays started being passed around by value.
Jan 30 2007
Andrei Alexandrescu (See Website For Email) wrote:Walter Bright wrote:I disagree. What you describe is a way of working around a design issue instead of resolving it. The C (and D) static array has always been an odd type. It is not assignable, but is passed by reference to functions. You can not return static arrays from functions. Template libraries like std.bind have to add lots of extra code to special case the static arrays. Structs containing static arrays behave differently than static arrays themselves. Also, in D, the static array is the only type where typeof(T.init) != T. If static arrays became a value type, all irregularities could disappear. The only place this change would make any difference for old code is where static arrays are passed as function parameters. I believe the amount of code containing functions with static array arguments is very small in D. In fact, only two Phobos modules contain such functions (std.utf and std.md5). Converting those functions would be trivial: The by ref behavior would be attainable with the inout and out parameter types that today are forbidden for static arrays. /OskarAndrei Alexandrescu (See Website For Email) wrote:So then arrays by ref and a stdlib template that wraps an array in a struct could make everyone happy.Walter Bright wrote:I agree that it's a very bad idea to have such a discontinuity in ref vs value. Static arrays would have to be all by value or all by ref. Then, you'll just see slower speed for the larger ones.Thank you, this is in line with what I thought you meant. I'm glad I was right. Your proposal is sensible and well thought out. The only improvement I'd make is instead of: float4 f; Have: float[4] f; i.e. make static arrays of 2,3,4 dimensions have the suggested properties. I think this is quite doable.I don't think this is advisable. It practically introduces an arbitrary discontinuity. Think of a template of T and N that manipulates arrays T[N]. All of a sudden, for certain values of T and N, arrays started being passed around by value.
Feb 01 2007
Oskar Linde wrote:The C (and D) static array has always been an odd type. It is not assignable, but is passed by reference to functions. You can not return static arrays from functions. Template libraries like std.bind have to add lots of extra code to special case the static arrays. Structs containing static arrays behave differently than static arrays themselves. Also, in D, the static array is the only type where typeof(T.init) != T. If static arrays became a value type, all irregularities could disappear.But then people will complain about the inconsistency between static arrays and dynamic arrays. Why some are passed by value and others are passed by reference?The only place this change would make any difference for old code is where static arrays are passed as function parameters. I believe the amount of code containing functions with static array arguments is very small in D. In fact, only two Phobos modules contain such functions (std.utf and std.md5). Converting those functions would be trivial: The by ref behavior would be attainable with the inout and out parameter types that today are forbidden for static arrays.Probably things could be made to work; I personally am unclear on what's the most self-consistent set of rules. Andrei
Feb 01 2007
Andrei Alexandrescu (See Website For Email) wrote:Oskar Linde wrote:I don't see much complaints about the "inconsistency" between structs and classes (which is probably the closest thing in the language currently). Besides, static arrays are already more of a value type (typically allocated on the stack, directly in an aggregate, or in static data, like structs) and dynamic arrays more of a reference type (typically allocated on the heap, like classes). The biggest difference is that dynamic arrays _can_ also refer to static arrays.If static arrays became a value type, all irregularities could disappear.But then people will complain about the inconsistency between static arrays and dynamic arrays. Why some are passed by value and others are passed by reference?
Feb 01 2007
Andrei Alexandrescu (See Website For Email) wrote:Walter Bright wrote:That difference could be made using different notations: void func( float[4] ar ); // by value void funct(T,uint N)( T[N] ar ); void func( float[] ar ); // by ref void funct(T)( T[] ar ); This makes sense, too, since "float[4] x;" is already allocated on the stack.Thank you, this is in line with what I thought you meant. I'm glad I was right. Your proposal is sensible and well thought out. The only improvement I'd make is instead of: float4 f; Have: float[4] f; i.e. make static arrays of 2,3,4 dimensions have the suggested properties. I think this is quite doable.I don't think this is advisable. It practically introduces an arbitrary discontinuity. Think of a template of T and N that manipulates arrays T[N]. All of a sudden, for certain values of T and N, arrays started being passed around by value.What exactly makes small vectors unsuitable for an implementation via library elements? If we distill a list, maybe we also can figure ways in which we can make the language better, not only one special case of it.This is already very doable, thanks to the possibility of calling global functions taking an array as if they were members of the array: float Length( float[] ar ); // by ref float[4] ar; auto len = ar.Length(); L.
Jan 31 2007
Walter Bright wrote:Thank you, this is in line with what I thought you meant. I'm glad I was right. Your proposal is sensible and well thought out. The only improvement I'd make is instead of: float4 f; Have: float[4] f; i.e. make static arrays of 2,3,4 dimensions have the suggested properties. I think this is quite doable.I think some of the properties could be used with multiple dimensions. SIMD could still be used efficiently on these higher orders (just repeat the operation when you run out). This also opens up opportunities for parallel operations in the future. Anything else could go in the standard library. Swizzle is the odd one out. It would have to be a special case. We can only support up to N characters with it, unless you do something fancy like: float[8] value; float[8] value2; value.xyz = value2.x; //xyz in value are set to value2.x value.xyz[xyz] = value2.x; //Sets both value[0-3] to xyz and value[4-7] to value2.x. or (value+4).xyz = value2.x; //Just offset the array (value[4-7] are set to value2.x) -Joel
Jan 31 2007
Having read the proposal, I think it can likely be accommodated entirely with current and up-and-coming language features. To expand: Mikola Lysenko wrote:It is important to make the distinction between low dimension vectors and other higher order arrays. This is crucial since the higher dimensional arrays are often more tenuously connected with any sort of geometric or physical interpretation.Well I disagree here, but it's not important. Look outside your domain. There are people who think as well the future belongs to machine learning and statistical methods, which are ripe with large dimensional vectors. I think it's just shortsighted to claim that geometrical and physical interpretation is more special than others.Moreover, many architectures are specially optimized for the lower dimensional cases, and offer special registers which are virtually impossible to properly exploit using libraries alone. The situation is analogous to floating point arithmetic, proper hardware support requires language level integration.I don't see much of an analogy, but this is also a minor point. Floating point arithmetic was weird way before it was implemented in hardware, and the weirdness reflected in languages. But on many machines adding two floating point numbers required a routine call, which didn't add or subtract anything to the equation.1. Additional primitive typesLet's try: s/primitive types/types/ig and see what happens, just for the sake of experiment.All numerical primitive types in the language will be supplemented with 3 additional vector extensions. Each vector extension will contain 2, 3 or 4 components since these are best supported by hardware, and the most relevant geometrically. The initial value for each component in each type will be the same as that of its base type. Each the size of each vector component will be equal to the size of its base type times its dimension. Here is a complete list of the new types to be added: Integer types: byte2, byte3, byte4, short2, short3, short4, int2, int3, int4, long2, long3, long4, cent2, cent3, cent4, ubyte2, ubyte3, ubyte4, ushort2, ushort3, ushort4, uint2, uint3, uint4, ulong2, ulong3, ulong4, ucent2, ucent3, ucent4 Floating types: float2, float3, float4, double2, double3, double4, real2, real3, real4 Imaginary types: ifloat2, ifloat3, ifloat4, idouble2, idouble3, idouble4, ireal2, ireal3, ireal4 Complex types: cfloat2, cfloat3, cfloat4, cdouble2, cdouble3, cdouble4, creal2, creal3, creal4My variant define vector_ops module: struct small_vector(base, ubyte size) { base[size] data; ... } alias small_vector!(byte, 2) byte2; ...All new vector types are pass-by-value, and may be used in combination with any number of other types.Check this one. Structs are passed by value, and various constructors and (up-and-coming) conversions will take care of the rest.The following expressions would all be valid: int2[] a; float4* b; char[][ucent4] c;Check that one.2. Vector literals It must be possible to declare a vector literal with unambiguously as an argument to a function, or any number of other situations. This syntax must be concise since it will be used often, and it must be flexible, allowing vectors to be constructed via concatenation. In this light, the following is proposed: float4(1, 2, 3, 4); int2(100, 200); creal3(1 + 2i, 3+4i, 5-6i);Constructors. Check that one.Additionally, this syntax will also support the construction of vectors from other sub vectors. This can be useful for augmenting a 3-vector with a fourth homogenous component for transformation amongst other things. Here are some examples: float4(float3(x, y, z), 1); // returns (x, y, z, 1) int3(0, int2(a, b)); // returns (0, a, b) creal4(ireal2(r, z), real2(p, q)); // returns (r, z, p, q)Constructors. Check that one too.3. Casting rules Vector types will obey the same casting rules as their base type within a single dimension. It is not possible to use a cast to change the dimension of a vector. cast(float2)(int2(1, 2)); //ok cast(int)(float2(3, 4)); //error, int is not the same dimension as float2 real3(a, b) + ireal(c, d); //ok, type of expression is creal3opCast. Yup, check that sucker too.4. Component-Wise Arithmetic All of the arithmetic operators on primitive types will be extended such that when applied to vectors they operate on each component independently. The size and types of the vectors must be compatible with the operator. float4(1, 2, 3, 4) + float4(3, 5, 6, 7); // Vector-Vector operation, returns (4, 7, 9, 11) float3(1, 5, 6) * float3(1, 0, 2); // returns (1, 0, 12) uint2(0xFF00FF, 0x00FF88) & uint2(0xFF, 0xFF); // returns (0xFF, 0x88) int3(x, y, z) * int2(a, b); // error, size of vectors does not matchOverloaded operators. Check.Additionally, scalar vector operations will be supported. Each scalar-vector operator is applied per-component within the vector. int2(0, 3) * 5; // Vector-Scalar operation, returns (0, 15) byte2(2, 5) + 6; // returns (8, 11)Overloaded operators. Check.Note that the only two comparison operators allowed on vectors are == and !=. The problem with <, >, etc. is that there is no commonly agreed upon definition for such things. Therefore the operators are omitted to avoid confusion. int2(1, 2) == int2(1, 2); //ok, evaluates to true float3(1, 0, 0) < float3(0, 0, 1); // error, < is undefined forvectors Check :o).5. Swizzle operations In many vector applications, it is often necessary to reorder the components. In shader languages this is accomplished via 'swizzle' properties in each vector type. Each swizzle selects a set of 1-4 vector components and places them into a new vector of the given dimension. Conventionally these components have the following names: x - first component y - second component z - third component w - fourth component A swizzle is then given as a property consisting of 1-4 components in a given order.This is more interesting. Today's D can't do it. Fortunately, code generation techniques that Walter works on right now (possibly literally at the time I'm writing this and/or at the time you're reading this) will enable comfortable generation of combinatorial functions, of which swizzles is actually an excellent motivating example. Here are some examples:float2(a, b).yx; // evaluates to (b, a) float3(1, 2, 3).zzyx; // evaluates to (3, 3, 2, 1) byte4('a', 'b', 'c', 'd').w; // evaluates to 'd' creal3(1 + 2i, 0, 4i).zzzz; // evaluates to (4i, 4i, 4i, 4i) int2(1, 2).z; // error, vector does not have 3 componentsCheck (for D 2.0).These can be useful when projecting homogenous vectors into a lower dimension subspace, or when computing various products. Here is an example of a cross product implemented using swizzles: float3 cross(float3 a, float3 b) { return a.zxy * b.yzx - a.yzx * b.zxy; } And here is a simple homogenous projection: float3 project(float4 hg) { return hg.xyz / hg.w; }Yum. Love that.6. Standard Library Additions[snip] Check by virtue of definition :o). Andrei
Jan 30 2007
Andrei Alexandrescu (See Website For Email) wrote:Having read the proposal, I think it can likely be accommodated entirely with current and up-and-coming language features. To expand:Your responses to the other points seem reasonable and encouraging, but your response to this issue concerning performance is... what? That making optimal use of the hardware is not important? For more specifics of what Mikola finds problematic on the efficiency front see: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmar .D&article_id=47596 --bbMoreover, many architectures are specially optimized for the lower dimensional cases, and offer special registers which are virtually impossible to properly exploit using libraries alone. The situation is analogous to floating point arithmetic, proper hardware support requires language level integration.I don't see much of an analogy, but this is also a minor point. Floating point arithmetic was weird way before it was implemented in hardware, and the weirdness reflected in languages. But on many machines adding two floating point numbers required a routine call, which didn't add or subtract anything to the equation.
Jan 30 2007
Bill Baxter wrote:Andrei Alexandrescu (See Website For Email) wrote:Nonono. I just meant: the fact that today's architectures have dedicated operations and hardware for floating point did not change the semantics of floating point operations in programming languages.Having read the proposal, I think it can likely be accommodated entirely with current and up-and-coming language features. To expand:Your responses to the other points seem reasonable and encouraging, but your response to this issue concerning performance is... what? That making optimal use of the hardware is not important?Moreover, many architectures are specially optimized for the lower dimensional cases, and offer special registers which are virtually impossible to properly exploit using libraries alone. The situation is analogous to floating point arithmetic, proper hardware support requires language level integration.I don't see much of an analogy, but this is also a minor point. Floating point arithmetic was weird way before it was implemented in hardware, and the weirdness reflected in languages. But on many machines adding two floating point numbers required a routine call, which didn't add or subtract anything to the equation.For more specifics of what Mikola finds problematic on the efficiency front see: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmar .D&article_id=47596Thanks. I'm seeing: 1. Function calls I agree that optimization of compound operations a la d = a * b + c is hard, but it should be addressed in a principled way (e.g. expression templates). 2. Register allocation This has been discussed in an older thread on comp.lang.c++.moderated. Basically current compiler technology conservatively puts every "serious" data structure (e.g. structs with constructors) in real memory, not registers. I see this as more of a limitation of current compiler technology, than a fundamental issue. Of course, I do no work on improving said technology, which lessens my point. :o) 3. Data alignment D must add primitives to define types aligned at arbitrary boundaries. This will not only help small vectors, but a host of other systems-level software as well (e.g. allocators). Andrei
Jan 30 2007
Andrei Alexandrescu (See Website For Email) wrote:Mikola Lysenko wrote:I presume the original proposal would have allowed both cast(float2)(int2(1, 2)) and cast(real2)(int2(1, 2)). Currently, however, we can have only one opCast per struct/class. Is this a planned D 2.0 change I've missed, or how would this work?3. Casting rules Vector types will obey the same casting rules as their base type within a single dimension. It is not possible to use a cast to change the dimension of a vector. cast(float2)(int2(1, 2)); //ok cast(int)(float2(3, 4)); //error, int is not the same dimension as float2 real3(a, b) + ireal(c, d); //ok, type of expression is creal3opCast. Yup, check that sucker too.
Feb 01 2007
Deewiant wrote:Andrei Alexandrescu (See Website For Email) wrote:It is a change that I personally think should be made. Walter devised the one cast per class rule out of caution when facing an issue known to be potentially disastrous. Generally, the "when in doubt, don't put it in" approach turned out to be a winner for D in general --- the "length" Chernobyl disaster notwithstanding :o). As our grasping of the tradeoffs involved improves, the right choices will appear naturally. This is sort of what happened with opAssign. AndreiMikola Lysenko wrote:I presume the original proposal would have allowed both cast(float2)(int2(1, 2)) and cast(real2)(int2(1, 2)). Currently, however, we can have only one opCast per struct/class. Is this a planned D 2.0 change I've missed, or how would this work?3. Casting rules Vector types will obey the same casting rules as their base type within a single dimension. It is not possible to use a cast to change the dimension of a vector. cast(float2)(int2(1, 2)); //ok cast(int)(float2(3, 4)); //error, int is not the same dimension as float2 real3(a, b) + ireal(c, d); //ok, type of expression is creal3opCast. Yup, check that sucker too.
Feb 01 2007
Andrei Alexandrescu (See Website For Email) wrote:It is a change that I personally think should be made. Walter devised the one cast per class rule out of caution when facing an issue known to be potentially disastrous. Generally, the "when in doubt, don't put it in" approach turned out to be a winner for D in general --- the "length" Chernobyl disaster notwithstanding :o).Yes, it's the "when we don't know what the semantics of this combination of features should be, make it an error" rule. That way, when it sometimes eventually becomes clear what those semantics should be, we don't have a backwards compatibility problem.
Feb 01 2007
Walter Bright wrote:Andrei Alexandrescu (See Website For Email) wrote:That's how D started without templates and ended up with having one of the best template systems around. Same, I think, will happen to macros. AndreiIt is a change that I personally think should be made. Walter devised the one cast per class rule out of caution when facing an issue known to be potentially disastrous. Generally, the "when in doubt, don't put it in" approach turned out to be a winner for D in general --- the "length" Chernobyl disaster notwithstanding :o).Yes, it's the "when we don't know what the semantics of this combination of features should be, make it an error" rule. That way, when it sometimes eventually becomes clear what those semantics should be, we don't have a backwards compatibility problem.
Feb 01 2007
Another thing I realized it would be nice to have is some intrinsic functions. Here are some things I can think of off hand: /// Saturated componentwise addition of two vectors. adds( T vector1, T vector2 ); /// Multiply Add. madd( T vector1, T vector2 ); /// Shifts all bits in "vector" right or left, respectively. shiftr( T vector ); shiftl( T vector ); /** Comparison routines that return masks. Compare the operands in the appropriate way. If the comparison is true for a particular set of components, the resulting component will be filled with all 1's, otherwise it will be filled with all 0's. These should be hardware defined CPUs with SSEs and MMX, which are used on most desktops today and probably for a looong time. */ compareLess( T vector1, T vector2 ); // returns vector1 < vector2 compareLessEquals( T vector1, T vector2 ); // returns vector1 <= vector2 compareGreater( T vector1, T vector2 ); // returns vector1 > vector2 ... (you get the idea) I'm looking forward to some slick SIMD support in D! I'm getting excited just to see this considered. *stoked* Oh and speaking of intrinsics, std.intrinsics is lacking the kinda obvious rotate left and rotate right (rol/ror) :(
Jan 30 2007
I think it is better to make very general framework for describing the vector operations, because when we have a very general description then the compiler is free to optimize as it sees fit. You can think at it as a SQL for vector operations. See http://all-technology.com/eigenpolls/dwishlist/index.php?it=10 There is also a suggestion for a general form of Swizzle operations http://all-technology.com/eigenpolls/dwishlist/index.php?it=11 I think that you forget that a program using vectors typical use a lot of them and therefor it needs to transform a lot at a time. Exsample take a models using 1000 vectors. Your suggestion would suggest defining it like float3 model[1000]. and it would suggest that they where placed in memory as x1 y1 z1 x2 y2 z2 x3 y3 z3 . . . each vector following each other. But now imagine that the most used operation in the program is a plan projection on the x-y plan, then we only need the x and y coordinate. So an arrangement in memory which looks like x1 x2 x3 . . . y1 y2 y3 . . . z1 z2 z3 . . . would result in that the cpu could specific load only the x and the y coordinates into the cache, which would speed up the calculations a lot. So, if one could specify that 3x1000 floats and let the compiler arrange them in memory that would be optimal.
Feb 02 2007
Knud Soerensen wrote:You can think at it as a SQL for vector operations. See http://all-technology.com/eigenpolls/dwishlist/index.php?it=10 There is also a suggestion for a general form of Swizzle operations http://all-technology.com/eigenpolls/dwishlist/index.php?it=11 I think that you forget that a program using vectors typical use a lot of them and therefor it needs to transform a lot at a time. Exsample take a models using 1000 vectors. Your suggestion would suggest defining it like float3 model[1000]. and it would suggest that they where placed in memory as x1 y1 z1 x2 y2 z2 x3 y3 z3 . . . each vector following each other. But now imagine that the most used operation in the program is a plan projection on the x-y plan, then we only need the x and y coordinate. So an arrangement in memory which looks like x1 x2 x3 . . . y1 y2 y3 . . . z1 z2 z3 . . . would result in that the cpu could specific load only the x and the y coordinates into the cache, which would speed up the calculations a lot. So, if one could specify that 3x1000 floats and let the compiler arrange them in memory that would be optimal.I think this is a special case. Typically I'd say the programmer would be applying operations on several component at a time. The SIMD is setup this way. Maybe the compiler could detect the case your taking about, however I think eventually the elements in the array would want to be used as components. I do however think a long list of values (such as a vertex buffer) would gain from SIMD optimized operations (future hardware devices which will add more and more array support and parallelism).I think it is better to make very general framework for describing the vector operations, because when we have a very general description then the compiler is free to optimize as it sees fit.This I agree with. The first versions need not provide SIMD optimizations they just need to be done in such a way that these can be added later without changing our code. I can imagine situations in the future where were write for instance a multiply and then an add. The compiler sees this (re-arrange code as it needs to) and change it to the SIMD muladd. ie arrays become like just another numerical datatype the compiler can manipulate. This sort of thing will help reduce micro-optimizations needed for this type of coding. -Joel
Feb 02 2007
Knud Soerensen wrote:I think it is better to make very general framework for describing the vector operations, because when we have a very general description then the compiler is free to optimize as it sees fit. You can think at it as a SQL for vector operations.Well, I can't say that I share your enthusiasm for the power of necromantic compiler optimizations. If you want to see what a vectorizing compiler looks like, take a gander at: http://www.codeplay.com/ VectorC is a C compiler which focuses on architecture specific SIMD optimizations. The problem with their approach is that it is too general. If you want to create optimized vector code, you need to tweak things just-so to make the compiler understand it. This means tedious trial-and-error optimization sessions where you end up rewriting the same piece of C-code 50 times in order to get the compiler to spit out the right assembler. To facilitate this process, they provide several tools and pragmas you can use to give the compiler hints. By the time you are done, you have a very fast vector optimized SSE function that only works on one specific compiler. Good luck porting your results it to GCC! Does it save you time at the end of the day? Maybe, but what happens when some future releases changes the optimizer and your code no longer comes out all clean and vectorized? You have to go back and repeat the process from scratch. For the amount of hair pulling these compilers cause I would just as soon write the damn code in assembler and be done with it once and for all. The basic issue with VectorC and all the other vectorizing compilers is that they try to do too much. Everyone seems so hell bent on creating this hyper-general-super-chocolate-coated-vectorizing-optimizer, when in reality it's not even necessary. Small geometric vectors are far more common, and improving their utility is a much bigger win for less effort. We need a convenient and fast set of arithmetic routines for doing simple geometric stuff, not an overarching super vector-framework.... imagine that the most used operation in the program is a plan projection on the x-y plan, then we only need the x and y coordinate. So an arrangement in memory which looks like x1 x2 x3 . . . y1 y2 y3 . . . z1 z2 z3 . . . would result in that the cpu could specific load only the x and the y coordinates into the cache, which would speed up the calculations a lot. So, if one could specify that 3x1000 floats and let the compiler arrange them in memory that would be optimal.This doesn't strike me as very realistic. How is the compiler ever going to figure out that you wanted to project these vectors onto the x-y plane somewhere down the road? This is a very difficult global optimization, and I can think of no way to perform it without direct programmer intervention. Moreover, I'm not sure why you split up the x-y components. A much better layout would look like this: x y x y x y x y x y ... z z z .... Now the alignment for each vector is such that you could load 2 x-y pairs into a single SSE register at once. This is not only better for caching, but it is also better for performance. However, neither the layout you proposed or the one above make much sense. Typically you want to do more with vectors than just drop the z-component. You need to perform matrix multiplication, normalization, dot and cross products etc. Each of these operates most efficiently when all vector components are packed into a single SIMD register. Therefore it seems clear that the preferred layout ought to be: x y z w x y z w .... I can't really fathom why you would want to go through the extra trouble of splitting up each component. Not only does it make arithmetic less efficient, but it also creates a book keeping nightmare. Imagine trying to grow the total number of vectors in your original layout! You would need to do multiple costly memcpy operations. For low dimension vectors, the necessary compiler optimizations are obvious, and there is one clear 'best' data layout. We don't need any fancy compiler magic, since the code can be directly translated into vector operations just like ordinary arithmetic.
Feb 03 2007
Mikola Lysenko wrote:Knud Soerensen wrote:I couldn't agree more with this last part. However I must say that for 'very particular problems' in the past I've found that splitting structs (vectors or whatever) up by there types can have significant performance gains when your doing batching operations on one component, due to cache. However this so special case and would be very difficult for the compiler to figure out. Its something I believe the programmer should do not the compiler. -JoelI think it is better to make very general framework for describing the vector operations, because when we have a very general description then the compiler is free to optimize as it sees fit. You can think at it as a SQL for vector operations.Moreover, I'm not sure why you split up the x-y components. A much better layout would look like this: x y x y x y x y x y ... z z z .... Now the alignment for each vector is such that you could load 2 x-y pairs into a single SSE register at once. This is not only better for caching, but it is also better for performance. However, neither the layout you proposed or the one above make much sense. Typically you want to do more with vectors than just drop the z-component. You need to perform matrix multiplication, normalization, dot and cross products etc. Each of these operates most efficiently when all vector components are packed into a single SIMD register. Therefore it seems clear that the preferred layout ought to be: x y z w x y z w .... I can't really fathom why you would want to go through the extra trouble of splitting up each component. Not only does it make arithmetic less efficient, but it also creates a book keeping nightmare. Imagine trying to grow the total number of vectors in your original layout! You would need to do multiple costly memcpy operations. For low dimension vectors, the necessary compiler optimizations are obvious, and there is one clear 'best' data layout. We don't need any fancy compiler magic, since the code can be directly translated into vector operations just like ordinary arithmetic.
Feb 03 2007
On Sat, 03 Feb 2007 13:18:55 -0500, Mikola Lysenko wrote:Knud Soerensen wrote:I agree with you on this. We need a way to write the vector code once and be able to compile it on many architectures with good results.I think it is better to make very general framework for describing the vector operations, because when we have a very general description then the compiler is free to optimize as it sees fit. You can think at it as a SQL for vector operations.Well, I can't say that I share your enthusiasm for the power of necromantic compiler optimizations. If you want to see what a vectorizing compiler looks like, take a gander at: http://www.codeplay.com/ VectorC is a C compiler which focuses on architecture specific SIMD optimizations. The problem with their approach is that it is too general. If you want to create optimized vector code, you need to tweak things just-so to make the compiler understand it. This means tedious trial-and-error optimization sessions where you end up rewriting the same piece of C-code 50 times in order to get the compiler to spit out the right assembler. To facilitate this process, they provide several tools and pragmas you can use to give the compiler hints. By the time you are done, you have a very fast vector optimized SSE function that only works on one specific compiler. Good luck porting your results it to GCC! Does it save you time at the end of the day? Maybe, but what happens when some future releases changes the optimizer and your code no longer comes out all clean and vectorized? You have to go back and repeat the process from scratch. For the amount of hair pulling these compilers cause I would just as soon write the damn code in assembler and be done with it once and for all.The basic issue with VectorC and all the other vectorizing compilers is that they try to do too much. Everyone seems so hell bent on creating this hyper-general-super-chocolate-coated-vectorizing-optimizer, when in reality it's not even necessary. Small geometric vectors are far more common, and improving their utility is a much bigger win for less effort.That might be true for your projects but not for mine.We need a convenient and fast set of arithmetic routines for doing simple geometric stuff, not an overarching super vector-framework.Actually, what I need is an overarching super TENSOR-framework ;-)Yes, the optimization might be difficult but the point is that the code should not impose artificial limitation on the compiler. If the compiler know how to make the optimization then it is free to make it.... imagine that the most used operation in the program is a plan projection on the x-y plan, then we only need the x and y coordinate. So an arrangement in memory which looks like x1 x2 x3 . . . y1 y2 y3 . . . z1 z2 z3 . . . would result in that the cpu could specific load only the x and the y coordinates into the cache, which would speed up the calculations a lot. So, if one could specify that 3x1000 floats and let the compiler arrange them in memory that would be optimal.This doesn't strike me as very realistic. How is the compiler ever going to figure out that you wanted to project these vectors onto the x-y plane somewhere down the road? This is a very difficult global optimization, and I can think of no way to perform it without direct programmer intervention.Moreover, I'm not sure why you split up the x-y components. A much better layout would look like this: x y x y x y x y x y ... z z z .... Now the alignment for each vector is such that you could load 2 x-y pairs into a single SSE register at once. This is not only better for caching, but it is also better for performance.Good a better example :-)However, neither the layout you proposed or the one above make much sense.I do not mean proposed a specific layout what I tried to say was that the compiler should be free to chose the layout which fit the problem.Typically you want to do more with vectors than just drop the z-component. You need to perform matrix multiplication, normalization, dot and cross products etc. Each of these operates most efficiently when all vector components are packed into a single SIMD register. Therefore it seems clear that the preferred layout ought to be: x y z w x y z w ....Then the compiler should use this layout for those typical problems, but we should not limit the compiler to generate sub-optimal code for non-typical problems.I can't really fathom why you would want to go through the extra trouble of splitting up each component. Not only does it make arithmetic less efficient, but it also creates a book keeping nightmare. Imagine trying to grow the total number of vectors in your original layout! You would need to do multiple costly memcpy operations.Imagine trying to grow the number of dimensions in your layout ;-)For low dimension vectors, the necessary compiler optimizations are obvious, and there is one clear 'best' data layout. We don't need any fancy compiler magic, since the code can be directly translated into vector operations just like ordinary arithmetic.Yes, good performance for typical problems is important but we should be careful not to chose a notation which ruin the performance for non-typical problems.
Feb 03 2007