digitalmars.D - Does dmd have SSE intrinsics?
- Jeremie Pelletier (3/3) Aug 26 2009 While writing SSE assembly by hand in D is fun and works well, I'm wonde...
- Don (13/18) Sep 21 2009 I know this is an old post, but since it wasn't answered...
- dsimcha (6/24) Sep 21 2009 performance.
- bearophile (18/20) Sep 21 2009 The idea is to improve array operations so they become a handy way to ef...
- Don (10/33) Sep 21 2009 (1) They don't take advantage of fixed-length arrays. In particular,
- Jeremie Pelletier (11/52) Sep 21 2009 I agree that a -arch switch of some sort would the best thing to hit
- bearophile (16/21) Sep 21 2009 In my answer I have forgotten to say another small thing.
- Jeremie Pelletier (8/37) Sep 21 2009 That 16bytes alignment is a restriction of the current usage of bit
- Robert Jacques (6/45) Sep 21 2009 Yes, but the unaligned version is slower, even for aligned data.
- bearophile (5/9) Sep 22 2009 Why doesn't D allow to return fixed-sized arrays from functions? It's a ...
- Robert Jacques (10/20) Sep 22 2009 [snip]
- Daniel Keep (20/43) Sep 22 2009 The problem is that currently you have a class of types which can be
- Jeremie Pelletier (8/56) Sep 22 2009 Why would you declare void variables? The point of declaring typed
- Lutger (5/14) Sep 22 2009 exactly: thus 'return foo;' in generic code can mean 'return;' when foo ...
- Christopher Wright (14/23) Sep 22 2009 It simplifies generic code a fair bit. Let's say you want to intercept a...
- Jeremie Pelletier (9/37) Sep 22 2009 I don't get how void could be used to simplify generic code. You can
- Daniel Keep (34/42) Sep 22 2009 You can't take the address of a return value. I'm not even sure you
- Jeremie Pelletier (11/71) Sep 22 2009 Oops sorry! I tend to forget the semantics and syntax of D1, I haven't
- Robert Jacques (11/43) Sep 22 2009 Because auto returns suffer from forward referencing problems :
- Andrei Alexandrescu (4/32) Sep 22 2009 Yah, but inside "do something interesting" you need to do special casing...
- Christopher Wright (4/8) Sep 23 2009 Sure, but if you're writing a generic library you can punt the problem
- Andrei Alexandrescu (7/55) Sep 22 2009 Yah, same in std.variant. I think there it's called
- Robert Jacques (12/34) Sep 22 2009 [snip]
- grauzone (12/49) Sep 22 2009 I think static arrays should be value types. Then this isn't a problem
- Andrei Alexandrescu (4/59) Sep 22 2009 I think that already works.
- Daniel Keep (22/30) Sep 22 2009 Here's an OLD example:
- Andrei Alexandrescu (8/46) Sep 22 2009 ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
- Jeremie Pelletier (4/54) Sep 22 2009 Calling into a framehandler for such a trivial routine, especially if
- Andrei Alexandrescu (3/59) Sep 22 2009 I guess that's what the smiley was about!
- Jeremie Pelletier (3/65) Sep 22 2009 I thought it meant "there, problem solved!"
- Michel Fortin (30/35) Sep 23 2009 Here's some generic code that would benefit from void as a variable
- Christopher Wright (6/29) Sep 22 2009 You could ease the restriction by disallowing implicit conversion from
- Robert Jacques (7/38) Sep 22 2009 I'm not sure what you're referencing.
- Don (10/15) Sep 22 2009 The problem is that difference today is so extreme. On core2:
- Jeremie Pelletier (4/23) Sep 22 2009 I wasn't aware of that, and here I was wondering why my SSE code was
- #ponce (3/15) Sep 22 2009 Indeed SSE is known to be overkill when dealing with unaligned data.
- Jeremie Pelletier (8/24) Sep 22 2009 The D memory manager already aligns data on 16 bytes boundaries. The
- Robert Jacques (5/34) Sep 22 2009 Yes, although classes have hidden vars, which are runtime dependent,
- Jeremie Pelletier (8/46) Sep 22 2009 Ah yes, you are right. Then I guess it really is up to the programmer to...
- Christopher Wright (7/10) Sep 22 2009 Um, no. Field accesses for class variables are (pointer + offset).
- Robert Jacques (10/20) Sep 22 2009 Clarification: I meant slicing an array of value types. i.e. if the size...
- bearophile (93/95) Sep 22 2009 LDC doesn't align to 16 the normal arrays inside functions:
- bearophile (11/18) Sep 22 2009 As usual this discussion is developing into other directions that are bo...
While writing SSE assembly by hand in D is fun and works well, I'm wondering if the compiler has intrinsics for its instruction set, much like xmmintrin.h in C. The reason is that the compiler can usually reorder the intrinsics to optimize performance. I could always use C code to implement my SSE routines but then I'd lose the ability to inline them in D.
Aug 26 2009
Jeremie Pelletier wrote:While writing SSE assembly by hand in D is fun and works well, I'm wondering if the compiler has intrinsics for its instruction set, much like xmmintrin.h in C. The reason is that the compiler can usually reorder the intrinsics to optimize performance. I could always use C code to implement my SSE routines but then I'd lose the ability to inline them in D.I know this is an old post, but since it wasn't answered... Make sure you know what the SSE intrinsics actually *do* in VC++/Intel! I've read many complaints about how poorly they perform on all compilers -- the penalty for allowing them to be reordered is that extra instructions are often added, which means that straightforward C code is sometimes faster! In this regard, I'm personally excited about array operations. I think the need for SSE intrinsics and vectorisation is a result of abstract inversion: the instruction set is higher-level than the "high level language"! Array operations allow D to catch up with asm again. When array operations get implemented properly, it'll be interesting to see how much need for SSE intrinsics remains.
Sep 21 2009
== Quote from Don (nospam nospam.com)'s articleJeremie Pelletier wrote:if the compiler has intrinsics for its instruction set, much like xmmintrin.h in C.While writing SSE assembly by hand in D is fun and works well, I'm wonderingperformance.The reason is that the compiler can usually reorder the intrinsics to optimizeability to inline them in D.I could always use C code to implement my SSE routines but then I'd lose theI know this is an old post, but since it wasn't answered... Make sure you know what the SSE intrinsics actually *do* in VC++/Intel! I've read many complaints about how poorly they perform on all compilers -- the penalty for allowing them to be reordered is that extra instructions are often added, which means that straightforward C code is sometimes faster! In this regard, I'm personally excited about array operations. I think the need for SSE intrinsics and vectorisation is a result of abstract inversion: the instruction set is higher-level than the "high level language"! Array operations allow D to catch up with asm again. When array operations get implemented properly, it'll be interesting to see how much need for SSE intrinsics remains.What's wrong with the current implementation of array ops (other than a few misc. bugs that have already been filed)? I thought they already use SSE if available.
Sep 21 2009
dsimcha:What's wrong with the current implementation of array ops (other than a few misc. bugs that have already been filed)? I thought they already use SSE if available.The idea is to improve array operations so they become a handy way to efficiently use present and future (AVX too, http://en.wikipedia.org/wiki/Advanced_Vector_Extensions ) vector instructions. So for example if in my D code I have: float[4] a = [1.f, 2., 3., 4.]; float[4] b[] = 10f; float[4] c = a + b; The compiler has to use a single inlined SSE instruction to implement the third line (the 4 float sum) of D code. And to use two instructions to load & broadcast the float value 10 to a whole XMM register. If the D code is: float[8] a = [1.f, 2., 3., 4., 5., 6., 7., 8.]; float[8] b = [10.f, 20., 30., 40., 50., 60., 70., 80.]; float[8] c = a + b; The current vector instructions aren't wide enough to do that in a single instruction (but future AVX will be able to), so the compiler has to inline two SSE instructions. Currently such operations are implemented with calls to a function (that also tests if/what vector instructions are available), that slow down code if you have to sum just 4 floats. Another problem is that some important semantics is missing, for example some shuffling, and few other things. With some care some, most, or all such operations (keeping a good look at AVX too) can be mapped to built-in array methods... The problem here is that you don't want to tie too much the D language to the currently available vector instructions because in 5-10 years CPUs may change. So what you want is to add enough semantics that later the compiler can compile as it can (with the scalar instructions, with SSE1, with future AVX 1024 bit wide, or with something today unknown). If the language doesn't give enough semantics to the compiler, you are forced to do as GCC that now tries to infer vector operations from normal code, but it's a complex thing and usually not as efficient as using GCC SSE intrinsics. This is something that deserves a thread here :-) In the end implementing all this doesn't look hard. It's mostly a matter of designing it well (while implementing the auto-vectorization as in GCC is harder to implement). Bye, bearophile
Sep 21 2009
dsimcha wrote:== Quote from Don (nospam nospam.com)'s article(1) They don't take advantage of fixed-length arrays. In particular, operations on float[4] should be a single SSE instruction (no function call, no loop, nothing). This will make a huge difference to game and graphics programmers, I believe. (2) The operations don't block on cache size. (3) DMD doesn't allow you to generate code assuming a minimum CPU capabilities. (In fact, when generating inline asm, the CPU type is 8086! (this is in bugzilla)) This limits the possible use of (1). It's issue (1) which is the killer.Jeremie Pelletier wrote:if the compiler has intrinsics for its instruction set, much like xmmintrin.h in C.While writing SSE assembly by hand in D is fun and works well, I'm wonderingperformance.The reason is that the compiler can usually reorder the intrinsics to optimizeability to inline them in D.I could always use C code to implement my SSE routines but then I'd lose theI know this is an old post, but since it wasn't answered... Make sure you know what the SSE intrinsics actually *do* in VC++/Intel! I've read many complaints about how poorly they perform on all compilers -- the penalty for allowing them to be reordered is that extra instructions are often added, which means that straightforward C code is sometimes faster! In this regard, I'm personally excited about array operations. I think the need for SSE intrinsics and vectorisation is a result of abstract inversion: the instruction set is higher-level than the "high level language"! Array operations allow D to catch up with asm again. When array operations get implemented properly, it'll be interesting to see how much need for SSE intrinsics remains.What's wrong with the current implementation of array ops (other than a few misc. bugs that have already been filed)? I thought they already use SSE if available.
Sep 21 2009
Don wrote:dsimcha wrote:I agree that a -arch switch of some sort would the best thing to hit dmd. It is already most useful in gcc which supported up to core2 when I last used it. I wrote a linear algebra module with support for 2D,3D,4D vectors, quaternions, 3x2 and 4x4 matrices, all with template structs so I can declare them for float, double, or real components. I used SSE for the bigger operations which grew up the module size considerably. This is where I first started looking for SSE intrinsics. It would also be greatly helpful if the compiler could generate SSE code by itself, it would save a LOT of inline assembly for simple operations.== Quote from Don (nospam nospam.com)'s article(1) They don't take advantage of fixed-length arrays. In particular, operations on float[4] should be a single SSE instruction (no function call, no loop, nothing). This will make a huge difference to game and graphics programmers, I believe. (2) The operations don't block on cache size. (3) DMD doesn't allow you to generate code assuming a minimum CPU capabilities. (In fact, when generating inline asm, the CPU type is 8086! (this is in bugzilla)) This limits the possible use of (1). It's issue (1) which is the killer.Jeremie Pelletier wrote:if the compiler has intrinsics for its instruction set, much like xmmintrin.h in C.While writing SSE assembly by hand in D is fun and works well, I'm wonderingperformance.The reason is that the compiler can usually reorder the intrinsics to optimizeability to inline them in D.I could always use C code to implement my SSE routines but then I'd lose theI know this is an old post, but since it wasn't answered... Make sure you know what the SSE intrinsics actually *do* in VC++/Intel! I've read many complaints about how poorly they perform on all compilers -- the penalty for allowing them to be reordered is that extra instructions are often added, which means that straightforward C code is sometimes faster! In this regard, I'm personally excited about array operations. I think the need for SSE intrinsics and vectorisation is a result of abstract inversion: the instruction set is higher-level than the "high level language"! Array operations allow D to catch up with asm again. When array operations get implemented properly, it'll be interesting to see how much need for SSE intrinsics remains.What's wrong with the current implementation of array ops (other than a few misc. bugs that have already been filed)? I thought they already use SSE if available.
Sep 21 2009
Don:(1) They don't take advantage of fixed-length arrays. In particular, operations on float[4] should be a single SSE instruction (no function call, no loop, nothing). This will make a huge difference to game and graphics programmers, I believe.[...]It's issue (1) which is the killer.In my answer I have forgotten to say another small thing. The std.gc.malloc() of D returns pointers aligned to 16 bytes (but I may like to add a second argument to such GC malloc, to specify the alignment, this can be used to save some memory when the alignment isn't necessary), while I think the std.c.stdlib.malloc() doesn't give pointers aligned to 16 bytes. In the following code if you want to implement the last line with one vector instruction then a and b arrays have to be aligned to 16 bytes. I think that currently LDC doesn't align a and b to 16 bytes. float[4] a = [1.f, 2., 3., 4.]; float[4] b[] = 10f; float[4] c[] = a[] + b[]; So you may need a syntax like the following, that's not handy: align(16) float[4] a = [1.f, 2., 3., 4.]; align(16) float[4] b[] = 10f; align(16) float[4] c[] = a[] + b[]; A possible solution is to automatically align to 16 (by default, but it can be changed to save stack space in specific situations) all static arrays allocated on the stack too :-) A note: in future probably CPU vector instructions will relax their alignment requirements... it's already happening. Bye, bearophile
Sep 21 2009
bearophile wrote:Don:That 16bytes alignment is a restriction of the current usage of bit fields. Since every bit in the field indexes a single 16bytes block, a simple shift 4 bits to the right translate a pointer into its index in the bit field. You could align on 4 bytes boundaries but at the cost of doubling the size of bit fields, and possibly having slower collection runs. Doesn't SSE have aligned and unaligned versions of its move instructions? like MOVAPS and MOVUPS.(1) They don't take advantage of fixed-length arrays. In particular, operations on float[4] should be a single SSE instruction (no function call, no loop, nothing). This will make a huge difference to game and graphics programmers, I believe.[...]It's issue (1) which is the killer.In my answer I have forgotten to say another small thing. The std.gc.malloc() of D returns pointers aligned to 16 bytes (but I may like to add a second argument to such GC malloc, to specify the alignment, this can be used to save some memory when the alignment isn't necessary), while I think the std.c.stdlib.malloc() doesn't give pointers aligned to 16 bytes. In the following code if you want to implement the last line with one vector instruction then a and b arrays have to be aligned to 16 bytes. I think that currently LDC doesn't align a and b to 16 bytes. float[4] a = [1.f, 2., 3., 4.]; float[4] b[] = 10f; float[4] c[] = a[] + b[]; So you may need a syntax like the following, that's not handy: align(16) float[4] a = [1.f, 2., 3., 4.]; align(16) float[4] b[] = 10f; align(16) float[4] c[] = a[] + b[]; A possible solution is to automatically align to 16 (by default, but it can be changed to save stack space in specific situations) all static arrays allocated on the stack too :-) A note: in future probably CPU vector instructions will relax their alignment requirements... it's already happening. Bye, bearophile
Sep 21 2009
On Mon, 21 Sep 2009 18:32:50 -0400, Jeremie Pelletier <jeremiep gmail.com> wrote:bearophile wrote:Yes, but the unaligned version is slower, even for aligned data. Also, another issue for game/graphic/robotic programmers is the ability to return fixed length arrays from functions. Though struct wrappers mitigates this.Don:That 16bytes alignment is a restriction of the current usage of bit fields. Since every bit in the field indexes a single 16bytes block, a simple shift 4 bits to the right translate a pointer into its index in the bit field. You could align on 4 bytes boundaries but at the cost of doubling the size of bit fields, and possibly having slower collection runs. Doesn't SSE have aligned and unaligned versions of its move instructions? like MOVAPS and MOVUPS.(1) They don't take advantage of fixed-length arrays. In particular, operations on float[4] should be a single SSE instruction (no function call, no loop, nothing). This will make a huge difference to game and graphics programmers, I believe.[...]It's issue (1) which is the killer.In my answer I have forgotten to say another small thing. The std.gc.malloc() of D returns pointers aligned to 16 bytes (but I may like to add a second argument to such GC malloc, to specify the alignment, this can be used to save some memory when the alignment isn't necessary), while I think the std.c.stdlib.malloc() doesn't give pointers aligned to 16 bytes. In the following code if you want to implement the last line with one vector instruction then a and b arrays have to be aligned to 16 bytes. I think that currently LDC doesn't align a and b to 16 bytes. float[4] a = [1.f, 2., 3., 4.]; float[4] b[] = 10f; float[4] c[] = a[] + b[]; So you may need a syntax like the following, that's not handy: align(16) float[4] a = [1.f, 2., 3., 4.]; align(16) float[4] b[] = 10f; align(16) float[4] c[] = a[] + b[]; A possible solution is to automatically align to 16 (by default, but it can be changed to save stack space in specific situations) all static arrays allocated on the stack too :-) A note: in future probably CPU vector instructions will relax their alignment requirements... it's already happening. Bye, bearophile
Sep 21 2009
Robert Jacques:Yes, but the unaligned version is slower, even for aligned data.This is true today, but in future it may become a little less true, thanks to improvements in the CPUs.Also, another issue for game/graphic/robotic programmers is the ability to return fixed length arrays from functions. Though struct wrappers mitigates this.Why doesn't D allow to return fixed-sized arrays from functions? It's a basic feature that I can find useful in many situations, it looks more useful than most of the last features implemented in D2. Bye, bearophile
Sep 22 2009
On Tue, 22 Sep 2009 07:09:09 -0400, bearophile <bearophileHUGS lycos.com> wrote:Robert Jacques:[snip]Well, fixed length arrays are an implicit/explicit pointer to some (stack/heap) allocated memory. So returning a fixed length array usually means returning a pointer to now invalid stack memory. Allowing fixed-length arrays to be returned by value would be nice, but basically means the compiler is wrapping the array in a struct, which is easy enough to do yourself. Using wrappers also avoids the breaking the logical semantics of arrays (i.e. pass by reference).Also, another issue for game/graphic/robotic programmers is the ability to return fixed length arrays from functions. Though struct wrappers mitigates this.Why doesn't D allow to return fixed-sized arrays from functions? It's a basic feature that I can find useful in many situations, it looks more useful than most of the last features implemented in D2. Bye, bearophile
Sep 22 2009
Robert Jacques wrote:On Tue, 22 Sep 2009 07:09:09 -0400, bearophile <bearophileHUGS lycos.com> wrote:The problem is that currently you have a class of types which can be passed as arguments but cannot be returned. For example, Tango's Variant has this horrible hack where the ACTUAL definition of Variant.get is: returnT!(S) get(S)(); where you have: template returnT(T) { static if( isStaticArrayType!(T) ) alias typeof(T.dup) returnT; else alias T returnT; } I can't recall the number of times this stupid hole in the language has bitten me. As for safety concerns, it's really no different to allowing people to return delegates. Not a very good reason, but I *REALLY* hate having to special-case static arrays. P.S. And another thing while I'm at it: why can't we declare void variables? This is another thing that really complicates generic code.Robert Jacques:[snip]Well, fixed length arrays are an implicit/explicit pointer to some (stack/heap) allocated memory. So returning a fixed length array usually means returning a pointer to now invalid stack memory. Allowing fixed-length arrays to be returned by value would be nice, but basically means the compiler is wrapping the array in a struct, which is easy enough to do yourself. Using wrappers also avoids the breaking the logical semantics of arrays (i.e. pass by reference).Also, another issue for game/graphic/robotic programmers is the ability to return fixed length arrays from functions. Though struct wrappers mitigates this.Why doesn't D allow to return fixed-sized arrays from functions? It's a basic feature that I can find useful in many situations, it looks more useful than most of the last features implemented in D2. Bye, bearophile
Sep 22 2009
Daniel Keep wrote:Robert Jacques wrote:Why would you declare void variables? The point of declaring typed variables is to know what kind of storage to use, void means no storage at all. The only time I use void in variable types is for void* and void[] (which really is just a void* with a length). In fact, every single scope has an infinity of void variables, you just don't need to explicitly declare them :) 'void foo;' is the same semantically as ''.On Tue, 22 Sep 2009 07:09:09 -0400, bearophile <bearophileHUGS lycos.com> wrote:The problem is that currently you have a class of types which can be passed as arguments but cannot be returned. For example, Tango's Variant has this horrible hack where the ACTUAL definition of Variant.get is: returnT!(S) get(S)(); where you have: template returnT(T) { static if( isStaticArrayType!(T) ) alias typeof(T.dup) returnT; else alias T returnT; } I can't recall the number of times this stupid hole in the language has bitten me. As for safety concerns, it's really no different to allowing people to return delegates. Not a very good reason, but I *REALLY* hate having to special-case static arrays. P.S. And another thing while I'm at it: why can't we declare void variables? This is another thing that really complicates generic code.Robert Jacques:[snip]Well, fixed length arrays are an implicit/explicit pointer to some (stack/heap) allocated memory. So returning a fixed length array usually means returning a pointer to now invalid stack memory. Allowing fixed-length arrays to be returned by value would be nice, but basically means the compiler is wrapping the array in a struct, which is easy enough to do yourself. Using wrappers also avoids the breaking the logical semantics of arrays (i.e. pass by reference).Also, another issue for game/graphic/robotic programmers is the ability to return fixed length arrays from functions. Though struct wrappers mitigates this.Why doesn't D allow to return fixed-sized arrays from functions? It's a basic feature that I can find useful in many situations, it looks more useful than most of the last features implemented in D2. Bye, bearophile
Sep 22 2009
Jeremie Pelletier wrote: ...Why would you declare void variables? The point of declaring typed variables is to know what kind of storage to use, void means no storage at all. The only time I use void in variable types is for void* and void[] (which really is just a void* with a length). In fact, every single scope has an infinity of void variables, you just don't need to explicitly declare them :) 'void foo;' is the same semantically as ''.exactly: thus 'return foo;' in generic code can mean 'return;' when foo is of type void. This is similar to how return foo(); is already allowed when foo itself returns void.
Sep 22 2009
Jeremie Pelletier wrote:Why would you declare void variables? The point of declaring typed variables is to know what kind of storage to use, void means no storage at all. The only time I use void in variable types is for void* and void[] (which really is just a void* with a length). In fact, every single scope has an infinity of void variables, you just don't need to explicitly declare them :) 'void foo;' is the same semantically as ''.It simplifies generic code a fair bit. Let's say you want to intercept a method call transparently -- maybe wrap it in a database transaction, for instance. I do similar things in dmocks. Anyway, you need to store the return value. You could write: ReturnType!(func) func(ParameterTupleOf!(func) params) { auto result = innerObj.func(params); // do something interesting return result; } Except then you get the error: voids have no value So instead you need to do some amount of special casing, perhaps quite a lot if you have to do something with the function result.
Sep 22 2009
Christopher Wright wrote:Jeremie Pelletier wrote:I don't get how void could be used to simplify generic code. You can already use type unions and variants for that and if you need a single more generic type you can always use void* to point to the data. Besides in your above example, suppose the interesting thing its doing is to modify the result data, how would the compiler know how to modify void? It would just push back the error to the next statement. Why don't you just replace ReturnType!func by auto and let the compiler resolve the return type to void?Why would you declare void variables? The point of declaring typed variables is to know what kind of storage to use, void means no storage at all. The only time I use void in variable types is for void* and void[] (which really is just a void* with a length). In fact, every single scope has an infinity of void variables, you just don't need to explicitly declare them :) 'void foo;' is the same semantically as ''.It simplifies generic code a fair bit. Let's say you want to intercept a method call transparently -- maybe wrap it in a database transaction, for instance. I do similar things in dmocks. Anyway, you need to store the return value. You could write: ReturnType!(func) func(ParameterTupleOf!(func) params) { auto result = innerObj.func(params); // do something interesting return result; } Except then you get the error: voids have no value So instead you need to do some amount of special casing, perhaps quite a lot if you have to do something with the function result.
Sep 22 2009
Jeremie Pelletier wrote:I don't get how void could be used to simplify generic code. You can already use type unions and variants for that and if you need a single more generic type you can always use void* to point to the data.You can't take the address of a return value. I'm not even sure you could define a union type that would function generically without specialising on void anyway. And using a Variant is just ridiculous; it's adding runtime overhead that is completely unnecessary.Besides in your above example, suppose the interesting thing its doing is to modify the result data, how would the compiler know how to modify void? It would just push back the error to the next statement.Example from actual code: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { alias ReturnType!(Fn) returnT; static if( is( returnT == void ) ) Fn(args); else auto result = Fn(args); glCheckError(); static if( !is( returnT == void ) ) return result; } This function is used to wrap OpenGL calls so that error checking is performed automatically. Here's what it would look like if we could use void variables: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { auto result = Fn(args); glCheckError(); return result; } I don't CARE about the result. If I did, I wouldn't be allowing voids at all, or I would be special-casing on it anyway and it wouldn't be an issue. The point is that there is NO WAY in a generic function to NOT care what the return type is. You have to, even if it ultimately doesn't matter.Why don't you just replace ReturnType!func by auto and let the compiler resolve the return type to void?Well, there's this thing called "D1". Quite a few people use it. Especially since D2 isn't finished yet.
Sep 22 2009
Daniel Keep wrote:Jeremie Pelletier wrote:Oops sorry! I tend to forget the semantics and syntax of D1, I haven't used it since I first found about D2! I would have to agree that you do make a good point here, void values could be useful in such a case, so long as the value is only assigned by method calls and not modified locally. Basically in your example, auto result would just mean "use no storage and ignore return statements on result if auto resolves to void, but keep the value around until I return result if auto resolves to any other type". JeremieI don't get how void could be used to simplify generic code. You can already use type unions and variants for that and if you need a single more generic type you can always use void* to point to the data.You can't take the address of a return value. I'm not even sure you could define a union type that would function generically without specialising on void anyway. And using a Variant is just ridiculous; it's adding runtime overhead that is completely unnecessary.Besides in your above example, suppose the interesting thing its doing is to modify the result data, how would the compiler know how to modify void? It would just push back the error to the next statement.Example from actual code: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { alias ReturnType!(Fn) returnT; static if( is( returnT == void ) ) Fn(args); else auto result = Fn(args); glCheckError(); static if( !is( returnT == void ) ) return result; } This function is used to wrap OpenGL calls so that error checking is performed automatically. Here's what it would look like if we could use void variables: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { auto result = Fn(args); glCheckError(); return result; } I don't CARE about the result. If I did, I wouldn't be allowing voids at all, or I would be special-casing on it anyway and it wouldn't be an issue. The point is that there is NO WAY in a generic function to NOT care what the return type is. You have to, even if it ultimately doesn't matter.Why don't you just replace ReturnType!func by auto and let the compiler resolve the return type to void?Well, there's this thing called "D1". Quite a few people use it. Especially since D2 isn't finished yet.
Sep 22 2009
On Tue, 22 Sep 2009 19:40:03 -0400, Jeremie Pelletier <jeremiep gmail.com> wrote:Christopher Wright wrote:Because auto returns suffer from forward referencing problems : //Bad auto x = bar; auto bar() { return foo; } auto foo() { return 1.0; } //Okay auto foo() { return 1.0; } auto bar() { return foo; } auto x = bar;Jeremie Pelletier wrote:I don't get how void could be used to simplify generic code. You can already use type unions and variants for that and if you need a single more generic type you can always use void* to point to the data. Besides in your above example, suppose the interesting thing its doing is to modify the result data, how would the compiler know how to modify void? It would just push back the error to the next statement. Why don't you just replace ReturnType!func by auto and let the compiler resolve the return type to void?Why would you declare void variables? The point of declaring typed variables is to know what kind of storage to use, void means no storage at all. The only time I use void in variable types is for void* and void[] (which really is just a void* with a length). In fact, every single scope has an infinity of void variables, you just don't need to explicitly declare them :) 'void foo;' is the same semantically as ''.It simplifies generic code a fair bit. Let's say you want to intercept a method call transparently -- maybe wrap it in a database transaction, for instance. I do similar things in dmocks. Anyway, you need to store the return value. You could write: ReturnType!(func) func(ParameterTupleOf!(func) params) { auto result = innerObj.func(params); // do something interesting return result; } Except then you get the error: voids have no value So instead you need to do some amount of special casing, perhaps quite a lot if you have to do something with the function result.
Sep 22 2009
Christopher Wright wrote:Jeremie Pelletier wrote:Yah, but inside "do something interesting" you need to do special casing anyway. AndreiWhy would you declare void variables? The point of declaring typed variables is to know what kind of storage to use, void means no storage at all. The only time I use void in variable types is for void* and void[] (which really is just a void* with a length). In fact, every single scope has an infinity of void variables, you just don't need to explicitly declare them :) 'void foo;' is the same semantically as ''.It simplifies generic code a fair bit. Let's say you want to intercept a method call transparently -- maybe wrap it in a database transaction, for instance. I do similar things in dmocks. Anyway, you need to store the return value. You could write: ReturnType!(func) func(ParameterTupleOf!(func) params) { auto result = innerObj.func(params); // do something interesting return result; } Except then you get the error: voids have no value So instead you need to do some amount of special casing, perhaps quite a lot if you have to do something with the function result.
Sep 22 2009
Andrei Alexandrescu wrote:Yah, but inside "do something interesting" you need to do special casing anyway. AndreiSure, but if you're writing a generic library you can punt the problem to the user, who may or may not care about the return value at all. As is, it's a cost you pay whether you care or not.
Sep 23 2009
Daniel Keep wrote:Robert Jacques wrote:Yah, same in std.variant. I think there it's called DecayStaticToDynamicArray!T. Has someone added the correct handling of static arrays to bugzilla? Walter wants to implement it, but we want to make sure it's not forgotten.On Tue, 22 Sep 2009 07:09:09 -0400, bearophile <bearophileHUGS lycos.com> wrote:The problem is that currently you have a class of types which can be passed as arguments but cannot be returned. For example, Tango's Variant has this horrible hack where the ACTUAL definition of Variant.get is: returnT!(S) get(S)(); where you have: template returnT(T) { static if( isStaticArrayType!(T) ) alias typeof(T.dup) returnT; else alias T returnT; } I can't recall the number of times this stupid hole in the language has bitten me. As for safety concerns, it's really no different to allowing people to return delegates. Not a very good reason, but I *REALLY* hate having to special-case static arrays.Robert Jacques:[snip]Well, fixed length arrays are an implicit/explicit pointer to some (stack/heap) allocated memory. So returning a fixed length array usually means returning a pointer to now invalid stack memory. Allowing fixed-length arrays to be returned by value would be nice, but basically means the compiler is wrapping the array in a struct, which is easy enough to do yourself. Using wrappers also avoids the breaking the logical semantics of arrays (i.e. pass by reference).Also, another issue for game/graphic/robotic programmers is the ability to return fixed length arrays from functions. Though struct wrappers mitigates this.Why doesn't D allow to return fixed-sized arrays from functions? It's a basic feature that I can find useful in many situations, it looks more useful than most of the last features implemented in D2. Bye, bearophileP.S. And another thing while I'm at it: why can't we declare void variables? This is another thing that really complicates generic code.How would you use them? Andrei
Sep 22 2009
On Tue, 22 Sep 2009 12:32:25 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Daniel Keep wrote:[snip]Well, what is the correct handling? Struct style RVO or delegate auto-magical heap allocation? Something else? Both solutions are far from perfect. RVO breaks the reference semantics of arrays, though it works for many common cases and is high performance. This would be my choice, as I would like to efficiently return short vectors from functions. Delegate style heap allocation runs into the whole I'd-rather-be-safe-than-sorry issue of excessive heap allocations. I'd imagine this would be better for generic code, since it would always work.The problem is that currently you have a class of types which can be passed as arguments but cannot be returned. For example, Tango's Variant has this horrible hack where the ACTUAL definition of Variant.get is: returnT!(S) get(S)(); where you have: template returnT(T) { static if( isStaticArrayType!(T) ) alias typeof(T.dup) returnT; else alias T returnT; } I can't recall the number of times this stupid hole in the language has bitten me. As for safety concerns, it's really no different to allowing people to return delegates. Not a very good reason, but I *REALLY* hate having to special-case static arrays.Yah, same in std.variant. I think there it's called DecayStaticToDynamicArray!T. Has someone added the correct handling of static arrays to bugzilla? Walter wants to implement it, but we want to make sure it's not forgotten.
Sep 22 2009
Robert Jacques wrote:On Tue, 22 Sep 2009 12:32:25 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I think static arrays should be value types. Then this isn't a problem anymore, and returning a static array can be handled exactly like returning structs. Didn't Walter once say that a type shouldn't behave differently, if it's wrapped into a struct? With current static array semantics, this rule is violated. If a static array has reference or value semantics kind of depends, if it's inside a struct: if you copy a struct, the embedded static array obviously looses its reference semantics. Also, I second that it should be possible to declare void variables. It'd be really useful for doing return value handling when transparently wrapping delegate calls in generic code.Daniel Keep wrote:[snip]Well, what is the correct handling? Struct style RVO or delegate auto-magical heap allocation? Something else? Both solutions are far from perfect. RVO breaks the reference semantics of arrays, though it works for many common cases and is high performance. This would be my choice, as I would like to efficiently return short vectors from functions. Delegate style heap allocation runs into the whole I'd-rather-be-safe-than-sorry issue of excessive heap allocations. I'd imagine this would be better for generic code, since it would always work.The problem is that currently you have a class of types which can be passed as arguments but cannot be returned. For example, Tango's Variant has this horrible hack where the ACTUAL definition of Variant.get is: returnT!(S) get(S)(); where you have: template returnT(T) { static if( isStaticArrayType!(T) ) alias typeof(T.dup) returnT; else alias T returnT; } I can't recall the number of times this stupid hole in the language has bitten me. As for safety concerns, it's really no different to allowing people to return delegates. Not a very good reason, but I *REALLY* hate having to special-case static arrays.Yah, same in std.variant. I think there it's called DecayStaticToDynamicArray!T. Has someone added the correct handling of static arrays to bugzilla? Walter wants to implement it, but we want to make sure it's not forgotten.
Sep 22 2009
grauzone wrote:Robert Jacques wrote:Yah.On Tue, 22 Sep 2009 12:32:25 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I think static arrays should be value types. Then this isn't a problem anymore, and returning a static array can be handled exactly like returning structs. Didn't Walter once say that a type shouldn't behave differently, if it's wrapped into a struct? With current static array semantics, this rule is violated. If a static array has reference or value semantics kind of depends, if it's inside a struct: if you copy a struct, the embedded static array obviously looses its reference semantics.Daniel Keep wrote:[snip]Well, what is the correct handling? Struct style RVO or delegate auto-magical heap allocation? Something else? Both solutions are far from perfect. RVO breaks the reference semantics of arrays, though it works for many common cases and is high performance. This would be my choice, as I would like to efficiently return short vectors from functions. Delegate style heap allocation runs into the whole I'd-rather-be-safe-than-sorry issue of excessive heap allocations. I'd imagine this would be better for generic code, since it would always work.The problem is that currently you have a class of types which can be passed as arguments but cannot be returned. For example, Tango's Variant has this horrible hack where the ACTUAL definition of Variant.get is: returnT!(S) get(S)(); where you have: template returnT(T) { static if( isStaticArrayType!(T) ) alias typeof(T.dup) returnT; else alias T returnT; } I can't recall the number of times this stupid hole in the language has bitten me. As for safety concerns, it's really no different to allowing people to return delegates. Not a very good reason, but I *REALLY* hate having to special-case static arrays.Yah, same in std.variant. I think there it's called DecayStaticToDynamicArray!T. Has someone added the correct handling of static arrays to bugzilla? Walter wants to implement it, but we want to make sure it's not forgotten.Also, I second that it should be possible to declare void variables. It'd be really useful for doing return value handling when transparently wrapping delegate calls in generic code.I think that already works. Andrei
Sep 22 2009
Andrei Alexandrescu wrote:Daniel Keep wrote:Here's an OLD example: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { alias ReturnType!(Fn) returnT; static if( is( returnT == void ) ) Fn(args); else auto result = Fn(args); glCheckError(); static if( !is( returnT == void ) ) return result; } This function is used to wrap OpenGL calls so that error checking is performed automatically. Here's what it would look like if we could use void variables: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { auto result = Fn(args); glCheckError(); return result; }P.S. And another thing while I'm at it: why can't we declare void variables? This is another thing that really complicates generic code.How would you use them? Andrei
Sep 22 2009
Daniel Keep wrote:Andrei Alexandrescu wrote:ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { scope(exit) glCheckError(); return Fn(args); } :o) AndreiDaniel Keep wrote:Here's an OLD example: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { alias ReturnType!(Fn) returnT; static if( is( returnT == void ) ) Fn(args); else auto result = Fn(args); glCheckError(); static if( !is( returnT == void ) ) return result; } This function is used to wrap OpenGL calls so that error checking is performed automatically. Here's what it would look like if we could use void variables: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { auto result = Fn(args); glCheckError(); return result; }P.S. And another thing while I'm at it: why can't we declare void variables? This is another thing that really complicates generic code.How would you use them? Andrei
Sep 22 2009
Andrei Alexandrescu wrote:Daniel Keep wrote:Calling into a framehandler for such a trivial routine, especially if used with real time rendering, is definitely not a good idea, no matter how elegant its syntax is!Andrei Alexandrescu wrote:ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { scope(exit) glCheckError(); return Fn(args); } :o) AndreiDaniel Keep wrote:Here's an OLD example: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { alias ReturnType!(Fn) returnT; static if( is( returnT == void ) ) Fn(args); else auto result = Fn(args); glCheckError(); static if( !is( returnT == void ) ) return result; } This function is used to wrap OpenGL calls so that error checking is performed automatically. Here's what it would look like if we could use void variables: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { auto result = Fn(args); glCheckError(); return result; }P.S. And another thing while I'm at it: why can't we declare void variables? This is another thing that really complicates generic code.How would you use them? Andrei
Sep 22 2009
Jeremie Pelletier wrote:Andrei Alexandrescu wrote:I guess that's what the smiley was about! AndreiDaniel Keep wrote:Calling into a framehandler for such a trivial routine, especially if used with real time rendering, is definitely not a good idea, no matter how elegant its syntax is!Andrei Alexandrescu wrote:ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { scope(exit) glCheckError(); return Fn(args); } :o) AndreiDaniel Keep wrote:Here's an OLD example: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { alias ReturnType!(Fn) returnT; static if( is( returnT == void ) ) Fn(args); else auto result = Fn(args); glCheckError(); static if( !is( returnT == void ) ) return result; } This function is used to wrap OpenGL calls so that error checking is performed automatically. Here's what it would look like if we could use void variables: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { auto result = Fn(args); glCheckError(); return result; }P.S. And another thing while I'm at it: why can't we declare void variables? This is another thing that really complicates generic code.How would you use them? Andrei
Sep 22 2009
Andrei Alexandrescu wrote:Jeremie Pelletier wrote:I thought it meant "there, problem solved!" :o)Andrei Alexandrescu wrote:I guess that's what the smiley was about! AndreiDaniel Keep wrote:Calling into a framehandler for such a trivial routine, especially if used with real time rendering, is definitely not a good idea, no matter how elegant its syntax is!Andrei Alexandrescu wrote:ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { scope(exit) glCheckError(); return Fn(args); } :o) AndreiDaniel Keep wrote:Here's an OLD example: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { alias ReturnType!(Fn) returnT; static if( is( returnT == void ) ) Fn(args); else auto result = Fn(args); glCheckError(); static if( !is( returnT == void ) ) return result; } This function is used to wrap OpenGL calls so that error checking is performed automatically. Here's what it would look like if we could use void variables: ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args) { auto result = Fn(args); glCheckError(); return result; }P.S. And another thing while I'm at it: why can't we declare void variables? This is another thing that really complicates generic code.How would you use them? Andrei
Sep 22 2009
On 2009-09-22 12:32:25 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:Daniel Keep wrote:Here's some generic code that would benefit from void as a variable type in the D/Objective-C bridge. Basically, it keeps the result of a function call, does some cleaning, and returns the result (with value conversions if needed). Unfortunately, you need a separate path for functions that returns void: // Call Objective-C code that may raise an exception here. static if (is(R == void)) func(objcArgs); else ObjcType!(R) objcResult = func(objcArgs); _NSRemoveHandler2(&_localHandler); // Converting return value. static if (is(R == void)) return; else return decapsulate!(R)(objcResult); It could be rewriten in a simpler way if void variables were supported: // Call Objective-C code that may raise an exception here. ObjcType!(R) objcResult = func(objcArgs); _NSRemoveHandler2(&_localHandler); // Converting return value. return decapsulate!(R)(objcResult); Note that returning a void resulting from a function call already works in D. You just can't "store" the result of such functions in a variable. That said, it's not a big hassle in this case, thanks to static if. What suffers most is code readability. -- Michel Fortin michel.fortin michelf.com http://michelf.com/P.S. And another thing while I'm at it: why can't we declare void variables? This is another thing that really complicates generic code.How would you use them?
Sep 23 2009
Robert Jacques wrote:On Tue, 22 Sep 2009 07:09:09 -0400, bearophile <bearophileHUGS lycos.com> wrote:You could ease the restriction by disallowing implicit conversion from static to dynamic arrays in certain situations. A function returning a dynamic array cannot return a static array; you cannot assign the return value of a function returning a static array to a dynamic array. Or in those cases, put the static array on the heap.Robert Jacques:[snip]Well, fixed length arrays are an implicit/explicit pointer to some (stack/heap) allocated memory. So returning a fixed length array usually means returning a pointer to now invalid stack memory. Allowing fixed-length arrays to be returned by value would be nice, but basically means the compiler is wrapping the array in a struct, which is easy enough to do yourself. Using wrappers also avoids the breaking the logical semantics of arrays (i.e. pass by reference).Also, another issue for game/graphic/robotic programmers is the ability to return fixed length arrays from functions. Though struct wrappers mitigates this.Why doesn't D allow to return fixed-sized arrays from functions? It's a basic feature that I can find useful in many situations, it looks more useful than most of the last features implemented in D2. Bye, bearophile
Sep 22 2009
On Tue, 22 Sep 2009 19:06:22 -0400, Christopher Wright <dhasenan gmail.com> wrote:Robert Jacques wrote:I'm not sure what you're referencing.On Tue, 22 Sep 2009 07:09:09 -0400, bearophile <bearophileHUGS lycos.com> wrote:You could ease the restriction by disallowing implicit conversion from static to dynamic arrays in certain situations. A function returning a dynamic array cannot return a static array; you cannot assign the return value of a function returning a static array to a dynamic array. Or in those cases, put the static array on the heap.Robert Jacques:[snip]Well, fixed length arrays are an implicit/explicit pointer to some (stack/heap) allocated memory. So returning a fixed length array usually means returning a pointer to now invalid stack memory. Allowing fixed-length arrays to be returned by value would be nice, but basically means the compiler is wrapping the array in a struct, which is easy enough to do yourself. Using wrappers also avoids the breaking the logical semantics of arrays (i.e. pass by reference).Also, another issue for game/graphic/robotic programmers is the ability to return fixed length arrays from functions. Though struct wrappers mitigates this.Why doesn't D allow to return fixed-sized arrays from functions? It's a basic feature that I can find useful in many situations, it looks more useful than most of the last features implemented in D2. Bye, bearophileA function returning a dynamic array cannot return a static array;This is already true; you have to .dup the array to return it.you cannot assign the return value of a function returning a static array to a dynamic array.This is already sorta true; once the return value is assigned to a static-array, it may then be implicitly casted to dynamic. Neither of which help the situation.
Sep 22 2009
bearophile wrote:Robert Jacques:The problem is that difference today is so extreme. On core2: movaps [mem128], xmm0; // aligned, 1 micro-op movups [mem128], xmm0; // unaligned, 9 micro-ops, even on aligned data! In practice it's about an 8X speed difference! On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops. On i7, movups on aligned data is the same speed as movaps. It's still slower if it's an unaligned access. It all depends on how important you think performance on Core2 and earlier Intel processors is.Yes, but the unaligned version is slower, even for aligned data.This is true today, but in future it may become a little less true, thanks to improvements in the CPUs.
Sep 22 2009
Don wrote:bearophile wrote:I wasn't aware of that, and here I was wondering why my SSE code was slower than the FPU in certain places on my core2 quad, I now recall using a lot of movups instructions, thanks for the tip.Robert Jacques:The problem is that difference today is so extreme. On core2: movaps [mem128], xmm0; // aligned, 1 micro-op movups [mem128], xmm0; // unaligned, 9 micro-ops, even on aligned data! In practice it's about an 8X speed difference! On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops. On i7, movups on aligned data is the same speed as movaps. It's still slower if it's an unaligned access. It all depends on how important you think performance on Core2 and earlier Intel processors is.Yes, but the unaligned version is slower, even for aligned data.This is true today, but in future it may become a little less true, thanks to improvements in the CPUs.
Sep 22 2009
Indeed SSE is known to be overkill when dealing with unaligned data. In C++ writing SSE code is so painful you either have to use intrisics, or use libraries like Eigen (a SIMD vectorization library based on expression templates, which can generate SSE, AVX or FPU code). But using such a library is often way too intrusive, and alignement is not in standard C++. D does already understand arrays operations like Eigen do, in order to increase cacheability. It would be great if it could statically detect 16-byte aligned data and perform SSE when possible (though there must be many others things to do :) ).In practice it's about an 8X speed difference! On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops. On i7, movups on aligned data is the same speed as movaps. It's still slower if it's an unaligned access. It all depends on how important you think performance on Core2 and earlier Intel processors is.I wasn't aware of that, and here I was wondering why my SSE code was slower than the FPU in certain places on my core2 quad, I now recall using a lot of movups instructions, thanks for the tip.
Sep 22 2009
#ponce wrote:The D memory manager already aligns data on 16 bytes boundaries. The only case I can think of right now is when data is in a struct or class: struct { float[4] vec; // aligned! int a; float[4] vec; // unaligned! }Indeed SSE is known to be overkill when dealing with unaligned data. In C++ writing SSE code is so painful you either have to use intrisics, or use libraries like Eigen (a SIMD vectorization library based on expression templates, which can generate SSE, AVX or FPU code). But using such a library is often way too intrusive, and alignement is not in standard C++. D does already understand arrays operations like Eigen do, in order to increase cacheability. It would be great if it could statically detect 16-byte aligned data and perform SSE when possible (though there must be many others things to do :) ).In practice it's about an 8X speed difference! On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops. On i7, movups on aligned data is the same speed as movaps. It's still slower if it's an unaligned access. It all depends on how important you think performance on Core2 and earlier Intel processors is.I wasn't aware of that, and here I was wondering why my SSE code was slower than the FPU in certain places on my core2 quad, I now recall using a lot of movups instructions, thanks for the tip.
Sep 22 2009
On Tue, 22 Sep 2009 12:09:23 -0400, Jeremie Pelletier <jeremiep gmail.com> wrote:#ponce wrote:Yes, although classes have hidden vars, which are runtime dependent, changing the offset. Structs may be embedded in other things (therefore offset). And then there's the whole slicing from an array issue.The D memory manager already aligns data on 16 bytes boundaries. The only case I can think of right now is when data is in a struct or class: struct { float[4] vec; // aligned! int a; float[4] vec; // unaligned! }Indeed SSE is known to be overkill when dealing with unaligned data. In C++ writing SSE code is so painful you either have to use intrisics, or use libraries like Eigen (a SIMD vectorization library based on expression templates, which can generate SSE, AVX or FPU code). But using such a library is often way too intrusive, and alignement is not in standard C++. D does already understand arrays operations like Eigen do, in order to increase cacheability. It would be great if it could statically detect 16-byte aligned data and perform SSE when possible (though there must be many others things to do :) ).In practice it's about an 8X speed difference! On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops. On i7, movups on aligned data is the same speed as movaps. It's still slower if it's an unaligned access. It all depends on how important you think performance on Core2 and earlier Intel processors is.I wasn't aware of that, and here I was wondering why my SSE code was slower than the FPU in certain places on my core2 quad, I now recall using a lot of movups instructions, thanks for the tip.
Sep 22 2009
Robert Jacques wrote:On Tue, 22 Sep 2009 12:09:23 -0400, Jeremie Pelletier <jeremiep gmail.com> wrote:Ah yes, you are right. Then I guess it really is up to the programmer to know if the data is aligned or not and select different code paths from it. Adding checks at runtime just adds to the overhead we're trying to save by using SSE in the first place. It would be great if we could declare aliases to asm instructions and use template functions with a (bool aligned = true) and set a movps alias to either movaps or movups depending on the value of aligned.#ponce wrote:Yes, although classes have hidden vars, which are runtime dependent, changing the offset. Structs may be embedded in other things (therefore offset). And then there's the whole slicing from an array issue.The D memory manager already aligns data on 16 bytes boundaries. The only case I can think of right now is when data is in a struct or class: struct { float[4] vec; // aligned! int a; float[4] vec; // unaligned! }Indeed SSE is known to be overkill when dealing with unaligned data. In C++ writing SSE code is so painful you either have to use intrisics, or use libraries like Eigen (a SIMD vectorization library based on expression templates, which can generate SSE, AVX or FPU code). But using such a library is often way too intrusive, and alignement is not in standard C++. D does already understand arrays operations like Eigen do, in order to increase cacheability. It would be great if it could statically detect 16-byte aligned data and perform SSE when possible (though there must be many others things to do :) ).In practice it's about an 8X speed difference! On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops. On i7, movups on aligned data is the same speed as movaps. It's still slower if it's an unaligned access. It all depends on how important you think performance on Core2 and earlier Intel processors is.I wasn't aware of that, and here I was wondering why my SSE code was slower than the FPU in certain places on my core2 quad, I now recall using a lot of movups instructions, thanks for the tip.
Sep 22 2009
Robert Jacques wrote:Yes, although classes have hidden vars, which are runtime dependent, changing the offset. Structs may be embedded in other things (therefore offset). And then there's the whole slicing from an array issue.Um, no. Field accesses for class variables are (pointer + offset). Successive subclasses append their fields to the object, so if you sliced an object and changed its vtbl pointer, you could get a valid instance of its superclass. If the class layout weren't determined at compile time, field accesses would be as slow as virtual function calls.
Sep 22 2009
On Tue, 22 Sep 2009 18:56:12 -0400, Christopher Wright <dhasenan gmail.com> wrote:Robert Jacques wrote:Clarification: I meant slicing an array of value types. i.e. if the size of the value type isn't a multiple of 16, then the alignment will change. (i.e. float3[]) As for classes, yes the compiler knows, but the point is that you don't know the size and therefore alignment of your super-class. Worse, it could change with different run-times or OSes. So trying to manually align things by introducing spacing vars, etc. is both hard, error-prone and non-portable.Yes, although classes have hidden vars, which are runtime dependent, changing the offset. Structs may be embedded in other things (therefore offset). And then there's the whole slicing from an array issue.Um, no. Field accesses for class variables are (pointer + offset). Successive subclasses append their fields to the object, so if you sliced an object and changed its vtbl pointer, you could get a valid instance of its superclass. If the class layout weren't determined at compile time, field accesses would be as slow as virtual function calls.
Sep 22 2009
Jeremie Pelletier:The D memory manager already aligns data on 16 bytes boundaries. The only case I can think of right now is when data is in a struct or class:LDC doesn't align to 16 the normal arrays inside functions: A small test program: void main() { float[4] a = [1.0f, 2.0, 3.0, 4.0]; float[4] b, c; b[] = 10.0f; c[] = a[] + b[]; } The ll code (the asm of the LLVM) LDC produces, this is the head: ldc -O3 -inline -release -output-ll vect1.d define x86_stdcallcc i32 _Dmain(%"char[][]" %unnamed) { entry: %a = alloca [4 x float], align 4 ; <[4 x float]*> [#uses=5] %b = alloca [4 x float], align 4 ; <[4 x float]*> [#uses=4] %c = alloca [4 x float], align 4 ; <[4 x float]*> [#uses=4] %.gc_mem = call noalias i8* _d_newarrayvT(%object.TypeInfo* _D11TypeInfo_Af6__initZ, i32 4) ; <i8*> [#uses=5] [...] The asm it produces for the whole main (the call to the array op is inlined, while _d_array_init_float is not inlined, I don't know why): ldc -O3 -inline -release -output-s vect1.d _Dmain: pushl %esi subl $64, %esp movl $4, 4(%esp) movl $_D11TypeInfo_Af6__initZ, (%esp) call _d_newarrayvT movl $1065353216, (%eax) movl $1073741824, 4(%eax) movl $1077936128, 8(%eax) movl $1082130432, 12(%eax) movl 8(%eax), %ecx movl %ecx, 56(%esp) movl 4(%eax), %ecx movl %ecx, 52(%esp) movl (%eax), %eax movl %eax, 48(%esp) movl $1082130432, 60(%esp) leal 32(%esp), %esi movl %esi, (%esp) movl $2143289344, 8(%esp) movl $4, 4(%esp) call _d_array_init_float leal 16(%esp), %eax movl %eax, (%esp) movl $2143289344, 8(%esp) movl $4, 4(%esp) call _d_array_init_float movl %esi, (%esp) movl $1092616192, 8(%esp) movl $4, 4(%esp) call _d_array_init_float movss 48(%esp), %xmm0 addss 32(%esp), %xmm0 movss %xmm0, 16(%esp) movss 52(%esp), %xmm0 addss 36(%esp), %xmm0 movss %xmm0, 20(%esp) movss 56(%esp), %xmm0 addss 40(%esp), %xmm0 movss %xmm0, 24(%esp) movss 60(%esp), %xmm0 addss 44(%esp), %xmm0 movss %xmm0, 28(%esp) xorl %eax, %eax addl $64, %esp popl %esi ret $8 By the way, using Link-Time Optimization and interning LDC produces this LL (whole main): define x86_stdcallcc i32 _Dmain(%"char[][]" %unnamed) { entry: %b = alloca [4 x float], align 4 ; <[4 x float]*> [#uses=1] %c = alloca [4 x float], align 4 ; <[4 x float]*> [#uses=1] %.gc_mem = call noalias i8* _d_newarrayvT(%object.TypeInfo* _D11TypeInfo_Af6__initZ, i32 4) ; <i8*> [#uses=4] %.gc_mem1 = bitcast i8* %.gc_mem to float* ; <float*> [#uses=1] store float 1.000000e+00, float* %.gc_mem1 %tmp3 = getelementptr i8* %.gc_mem, i32 4 ; <i8*> [#uses=1] %0 = bitcast i8* %tmp3 to float* ; <float*> [#uses=1] store float 2.000000e+00, float* %0 %tmp4 = getelementptr i8* %.gc_mem, i32 8 ; <i8*> [#uses=1] %1 = bitcast i8* %tmp4 to float* ; <float*> [#uses=1] store float 3.000000e+00, float* %1 %tmp5 = getelementptr i8* %.gc_mem, i32 12 ; <i8*> [#uses=1] %2 = bitcast i8* %tmp5 to float* ; <float*> [#uses=1] store float 4.000000e+00, float* %2 %tmp8 = getelementptr [4 x float]* %b, i32 0, i32 0 ; <float*> [#uses=2] call void _d_array_init_float(float* nocapture %tmp8, i32 4, float 0x7FF8000000000000) %tmp9 = getelementptr [4 x float]* %c, i32 0, i32 0 ; <float*> [#uses=1] call void _d_array_init_float(float* nocapture %tmp9, i32 4, float 0x7FF8000000000000) call void _d_array_init_float(float* nocapture %tmp8, i32 4, float 1.000000e+01) ret i32 0 } Bye, bearophile
Sep 22 2009
Robert Jacques:Well, fixed length arrays are an implicit/explicit pointer to some (stack/heap) allocated memory. So returning a fixed length array usually means returning a pointer to now invalid stack memory. Allowing fixed-length arrays to be returned by value would be nice, but basically means the compiler is wrapping the array in a struct, which is easy enough to do yourself. Using wrappers also avoids the breaking the logical semantics of arrays (i.e. pass by reference).As usual this discussion is developing into other directions that are both interesting and bordeline too complex for me :-) Arrays are the most common and useful data structure (beside single values/variables). And experience shows me that in some situations static arrays can lead to higher performance (for example if you have a matrix, and its number of columns is known at compile time and such number is a power of 2, then the compiler can use just a shift to find a cell). So I'd like to see improving the D management of such arrays (for me it's a MUCH more common problem than for example the last contravariant argument types discussed by Andrei. I am for improving simple things that I can understand and use every day first, and complex things later. D2 is getting too much difficult for me), even if some extra annotations are necessary. The possible ways that can be useful: - To return small arrays (for example the ones used by SSE/AVX registers) by value. Non need to create silly wrapper structs. The compiler has to show a performance warning when such arrays is bigger than 1024 bytes of RAM. - LLVM has good stack-allocated (alloca) arrays, like the ones introduced by C99. Having a way to use them in D too is good. - A way to return just the reference to a dynamic array when the function already takes in input the reference to it. - To automatically allocate and copy returned static arrays on the heap, to keep the situation safe and avoid too many copies of large arrays (so it gets copied only once here). I'm not sure about this. Bye, bearophile
Sep 22 2009