digitalmars.D - Can we fix reverse operator overloading (opSub_r et. al.)?
- Don (25/25) Jul 07 2009 Currently, template versions of opXXX_r appear to be useless.
- Lars T. Kyllingstad (3/35) Jul 07 2009 I recently ran into the exact same issue, so I agree completely.
- Andrei Alexandrescu (4/36) Jul 07 2009 Good point. This also reminds me that the entire operator overloading
- Lars T. Kyllingstad (4/43) Jul 07 2009 I assume that means you haven't written the "operator overloading"
- Walter Bright (2/4) Jul 09 2009 :-(
- Andrei Alexandrescu (4/9) Jul 09 2009 It's run its course like an old BMW. We need to do new things, and
- Don (8/19) Jul 10 2009 Indeed, I even think that the concept of operator overloading is wrong.
- Lars T. Kyllingstad (15/33) Jul 10 2009 That is true. There is, for instance, a good reason why the basic BLAS
- Don (25/68) Jul 10 2009 Curiously, in the DMD front-end, that's almost what happens with array
- Lars T. Kyllingstad (16/89) Jul 10 2009 There are actually three (four) basic types of vector/matrix
- Robert Jacques (4/11) Jul 11 2009 Actually, matrix multiplication and the dot product are both cases of th...
- =?ISO-8859-15?Q?=22J=E9r=F4me_M=2E_Berger=22?= (7/19) Jul 11 2009 And you can add the outer product...
- Lars T. Kyllingstad (8/20) Jul 13 2009 Actually, the dot product is both a special case of matrix
- Daniel Keep (7/31) Jul 13 2009 You could always convert to using downscode:
- Andrei Alexandrescu (3/5) Jul 10 2009 What are the limitations of multiple-argument []?
- Bill Baxter (14/20) Jul 10 2009 I don't know what Don had in mind, but multi-dimensional slice is
- Lars T. Kyllingstad (7/31) Jul 13 2009 Either, D needs built-in multidimensional arrays, or they must be
- Don (11/32) Jul 13 2009 And I don't think the latter option is necessarily correct. If A is an
- Jarrett Billingsley (16/21) Jul 10 2009 Something like C#'s would be nice:
- Robert Jacques (8/13) Jul 10 2009 Don might be refering to the $ operator, but that more applies to slicin...
- =?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= (9/52) Jul 11 2009 Actually, this has already been done in C++:=20
- Lutger (9/26) Jul 11 2009 Someone correct me if I'm wrong, but I think what Blade does is a bit mo...
- Robert Jacques (14/44) Jul 12 2009 Flens (and several other libraries like it) just provide a syntactic sug...
- =?ISO-8859-15?Q?=22J=E9r=F4me_M=2E_Berger=22?= (25/74) Jul 12 2009 o
- Robert Jacques (8/69) Jul 12 2009 For arrays this is never, ever the case. Array ops are memory bandwidth ...
- Rainer Deyke (12/17) Jul 10 2009 I disagree for two reasons:
- BCS (10/13) Jul 10 2009 Some (many?) of the interesting optimizations that can be applied to exp...
- Rainer Deyke (38/53) Jul 10 2009 The compiler shouldn't make any assumptions. The compiler doesn't
- Jarrett Billingsley (17/68) Jul 10 2009 You're somewhat wrong on two points:
- Rainer Deyke (13/28) Jul 10 2009 Yes and no in both cases. Yes, the way way DMD is set up makes these
- BCS (2/10) Jul 11 2009 When that happens we can all retire. :)
- BCS (15/44) Jul 11 2009 Algebraic optimizations would be optimizations that work by applying stu...
- Rainer Deyke (14/26) Jul 11 2009 There are two possible cases:
- BCS (8/39) Jul 11 2009 Duh :)
- ponce (3/18) Jul 12 2009 Wow, I didn't know that.
Currently, template versions of opXXX_r appear to be useless. Consider: struct A { int opSub(T)(T x) { return 1; } int opSub_r(T)(T x) { return -1; } } A a; int x = a - a; // is this a.opSub(a) or a.opSub_r(a) ??? This is annoying and rather silly. I can't imagine a case where you would template opXXX_r, without also templating opXXX. And by definition, a.opSub(a) is identical to a.opSub_r(a). Templated opSub_r has a broken design in both D1 and D2. We could fix it by adding a rule to operator overloading: [New rule] 1. If a and b have the same type, the expression is written as a.opfunc(b). [Existing rules] 2. If any a.opfunc or b.opfunc_r functions exist, then overloading is applied across all of them and the best match is used. ... Patching this is pretty easy. For example, in DMD2.031, opover.c, line 263, simply add: if ((t1 == t2)) { // x OP x can only be x.opOP(x), not x.opOP_r(x) s_r = NULL; }
Jul 07 2009
Don wrote:Currently, template versions of opXXX_r appear to be useless. Consider: struct A { int opSub(T)(T x) { return 1; } int opSub_r(T)(T x) { return -1; } } A a; int x = a - a; // is this a.opSub(a) or a.opSub_r(a) ??? This is annoying and rather silly. I can't imagine a case where you would template opXXX_r, without also templating opXXX. And by definition, a.opSub(a) is identical to a.opSub_r(a). Templated opSub_r has a broken design in both D1 and D2. We could fix it by adding a rule to operator overloading: [New rule] 1. If a and b have the same type, the expression is written as a.opfunc(b). [Existing rules] 2. If any a.opfunc or b.opfunc_r functions exist, then overloading is applied across all of them and the best match is used. ... Patching this is pretty easy. For example, in DMD2.031, opover.c, line 263, simply add: if ((t1 == t2)) { // x OP x can only be x.opOP(x), not x.opOP_r(x) s_r = NULL; }I recently ran into the exact same issue, so I agree completely. -Lars
Jul 07 2009
Don wrote:Currently, template versions of opXXX_r appear to be useless. Consider: struct A { int opSub(T)(T x) { return 1; } int opSub_r(T)(T x) { return -1; } } A a; int x = a - a; // is this a.opSub(a) or a.opSub_r(a) ??? This is annoying and rather silly. I can't imagine a case where you would template opXXX_r, without also templating opXXX. And by definition, a.opSub(a) is identical to a.opSub_r(a). Templated opSub_r has a broken design in both D1 and D2. We could fix it by adding a rule to operator overloading: [New rule] 1. If a and b have the same type, the expression is written as a.opfunc(b). [Existing rules] 2. If any a.opfunc or b.opfunc_r functions exist, then overloading is applied across all of them and the best match is used. ... Patching this is pretty easy. For example, in DMD2.031, opover.c, line 263, simply add: if ((t1 == t2)) { // x OP x can only be x.opOP(x), not x.opOP_r(x) s_r = NULL; }Good point. This also reminds me that the entire operator overloading feature must be thrown away and redesigned. Andrei
Jul 07 2009
Andrei Alexandrescu wrote:Don wrote:I assume that means you haven't written the "operator overloading" chapter of your book yet? ;) -LarsCurrently, template versions of opXXX_r appear to be useless. Consider: struct A { int opSub(T)(T x) { return 1; } int opSub_r(T)(T x) { return -1; } } A a; int x = a - a; // is this a.opSub(a) or a.opSub_r(a) ??? This is annoying and rather silly. I can't imagine a case where you would template opXXX_r, without also templating opXXX. And by definition, a.opSub(a) is identical to a.opSub_r(a). Templated opSub_r has a broken design in both D1 and D2. We could fix it by adding a rule to operator overloading: [New rule] 1. If a and b have the same type, the expression is written as a.opfunc(b). [Existing rules] 2. If any a.opfunc or b.opfunc_r functions exist, then overloading is applied across all of them and the best match is used. ... Patching this is pretty easy. For example, in DMD2.031, opover.c, line 263, simply add: if ((t1 == t2)) { // x OP x can only be x.opOP(x), not x.opOP_r(x) s_r = NULL; }Good point. This also reminds me that the entire operator overloading feature must be thrown away and redesigned. Andrei
Jul 07 2009
Andrei Alexandrescu wrote:This also reminds me that the entire operator overloading feature must be thrown away and redesigned.:-(
Jul 09 2009
Walter Bright wrote:Andrei Alexandrescu wrote:It's run its course like an old BMW. We need to do new things, and bolting them on what we have won't work. AndreiThis also reminds me that the entire operator overloading feature must be thrown away and redesigned.:-(
Jul 09 2009
Andrei Alexandrescu wrote:Walter Bright wrote:Indeed, I even think that the concept of operator overloading is wrong. In C++, operator overloading was just for syntax sugar. That's wrong: pretty much everything that you want overloaded operators for, is performance-critical. And that implies you need to deal on the level of complete expressions, not individual operations. Despite that, I'd still like to see this patch (or something like it) especially in DMD1 so that we can have a _reasonable_ syntax ASAP.Andrei Alexandrescu wrote:It's run its course like an old BMW. We need to do new things, and bolting them on what we have won't work. AndreiThis also reminds me that the entire operator overloading feature must be thrown away and redesigned.:-(
Jul 10 2009
Don wrote:Andrei Alexandrescu wrote:That is true. There is, for instance, a good reason why the basic BLAS matrix multiplication routine calculates a A B + b C (a,b: scalars; A,B,C: matrices) instead of just AB. Would/could one could gain something, performance-wise, by having such "expression overloading" as a built-in feature of the language itself, rather than as a library? BLADE has already shown that it is possible to do stuff like this in a library, but I think it goes without saying that if it was built into the language the syntax could be made considerably nicer. Compare: auto m = MatrixOp!("a*A*B + b*C")(aVal, bVal, aMtrx, bMtrx, cMtrx); auto m = a*A*B + b*C; If D could do this, I think it would become the next FORTRAN. :) -LarsWalter Bright wrote:Indeed, I even think that the concept of operator overloading is wrong. In C++, operator overloading was just for syntax sugar. That's wrong: pretty much everything that you want overloaded operators for, is performance-critical. And that implies you need to deal on the level of complete expressions, not individual operations.Andrei Alexandrescu wrote:It's run its course like an old BMW. We need to do new things, and bolting them on what we have won't work. AndreiThis also reminds me that the entire operator overloading feature must be thrown away and redesigned.:-(
Jul 10 2009
Lars T. Kyllingstad wrote:Don wrote:Curiously, in the DMD front-end, that's almost what happens with array operations. x[] = a*y[] + c*z[]; gets translated into: __somemangledarrayfuncname(x, y, z, a, c); and creates: __somemangledarrayfuncname(T[] p1, T[] p2, T[] p3, T c1, T c2) { for(int p=0; p < p1.length; ++p) { p1[p] = c1*p2[p] + c2*p3[p]; } } And there are many bugs associated with this, since the compiler doesn't distinguish x[] from x properly (where x is a dynamic array); you can get internal compiler errors referring to type mismatches of 'p'. This could, I think, be done better by putting a simple version of BLADE in the compiler runtime. A big issue with matrix overloading is, what do you with dot product? It's just as fundamental as '+' or '*', but it doesn't have an operator symbol. Consider that a 1x5 matrix multiplied by a 5x1 matrix is just a dot product of two vectors, and a smart compiler would be able to recognize that. The other thing that's desperately missing from D is multi-dimensional indexing.Andrei Alexandrescu wrote:That is true. There is, for instance, a good reason why the basic BLAS matrix multiplication routine calculates a A B + b C (a,b: scalars; A,B,C: matrices) instead of just AB. Would/could one could gain something, performance-wise, by having such "expression overloading" as a built-in feature of the language itself, rather than as a library? BLADE has already shown that it is possible to do stuff like this in a library, but I think it goes without saying that if it was built into the language the syntax could be made considerably nicer. Compare: auto m = MatrixOp!("a*A*B + b*C")(aVal, bVal, aMtrx, bMtrx, cMtrx); auto m = a*A*B + b*C; If D could do this, I think it would become the next FORTRAN. :) -LarsWalter Bright wrote:Indeed, I even think that the concept of operator overloading is wrong. In C++, operator overloading was just for syntax sugar. That's wrong: pretty much everything that you want overloaded operators for, is performance-critical. And that implies you need to deal on the level of complete expressions, not individual operations.Andrei Alexandrescu wrote:It's run its course like an old BMW. We need to do new things, and bolting them on what we have won't work. AndreiThis also reminds me that the entire operator overloading feature must be thrown away and redesigned.:-(
Jul 10 2009
Don wrote:Lars T. Kyllingstad wrote:There are actually three (four) basic types of vector/matrix multiplication, and the * operator would more or less be fitting for any of them: - element-by-element multiplication, which is what * means now - dot product - matrix multiplication (- cross product ) I wish more operators were easily accessible on keyboards. (Hey, I just found out that "×" is Shift-AltGr-* on my keboard!) Perhaps the modulus operator, %, could be reused for one of them? If you drink a few beers, add some goodwill, and perhaps squint a little, it almost looks like a cross. ;)Don wrote:Curiously, in the DMD front-end, that's almost what happens with array operations. x[] = a*y[] + c*z[]; gets translated into: __somemangledarrayfuncname(x, y, z, a, c); and creates: __somemangledarrayfuncname(T[] p1, T[] p2, T[] p3, T c1, T c2) { for(int p=0; p < p1.length; ++p) { p1[p] = c1*p2[p] + c2*p3[p]; } } And there are many bugs associated with this, since the compiler doesn't distinguish x[] from x properly (where x is a dynamic array); you can get internal compiler errors referring to type mismatches of 'p'. This could, I think, be done better by putting a simple version of BLADE in the compiler runtime. A big issue with matrix overloading is, what do you with dot product? It's just as fundamental as '+' or '*', but it doesn't have an operator symbol. Consider that a 1x5 matrix multiplied by a 5x1 matrix is just a dot product of two vectors, and a smart compiler would be able to recognize that.Andrei Alexandrescu wrote:That is true. There is, for instance, a good reason why the basic BLAS matrix multiplication routine calculates a A B + b C (a,b: scalars; A,B,C: matrices) instead of just AB. Would/could one could gain something, performance-wise, by having such "expression overloading" as a built-in feature of the language itself, rather than as a library? BLADE has already shown that it is possible to do stuff like this in a library, but I think it goes without saying that if it was built into the language the syntax could be made considerably nicer. Compare: auto m = MatrixOp!("a*A*B + b*C")(aVal, bVal, aMtrx, bMtrx, cMtrx); auto m = a*A*B + b*C; If D could do this, I think it would become the next FORTRAN. :) -LarsWalter Bright wrote:Indeed, I even think that the concept of operator overloading is wrong. In C++, operator overloading was just for syntax sugar. That's wrong: pretty much everything that you want overloaded operators for, is performance-critical. And that implies you need to deal on the level of complete expressions, not individual operations.Andrei Alexandrescu wrote:It's run its course like an old BMW. We need to do new things, and bolting them on what we have won't work. AndreiThis also reminds me that the entire operator overloading feature must be thrown away and redesigned.:-(The other thing that's desperately missing from D is multi-dimensional indexing.Agreed. But with the aforementioned "expression overloading", one could make extremely elegant multidimensional library types... -Lars
Jul 10 2009
On Fri, 10 Jul 2009 06:24:04 -0400, Lars T. Kyllingstad <public kyllingen.nospamnet> wrote:There are actually three (four) basic types of vector/matrix multiplication, and the * operator would more or less be fitting for any of them: - element-by-element multiplication, which is what * means now - dot product - matrix multiplication (- cross product )Actually, matrix multiplication and the dot product are both cases of the inner product and the cross product is specific to 3D vectors.
Jul 11 2009
Robert Jacques wrote:On Fri, 10 Jul 2009 06:24:04 -0400, Lars T. Kyllingstad=20 <public kyllingen.nospamnet> wrote:And you can add the outer product... Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.frThere are actually three (four) basic types of vector/matrix=20 multiplication, and the * operator would more or less be fitting for=20 any of them: - element-by-element multiplication, which is what * means now - dot product - matrix multiplication (- cross product )=20 Actually, matrix multiplication and the dot product are both cases of=20 the inner product and the cross product is specific to 3D vectors.
Jul 11 2009
Robert Jacques wrote:On Fri, 10 Jul 2009 06:24:04 -0400, Lars T. Kyllingstad <public kyllingen.nospamnet> wrote:Actually, the dot product is both a special case of matrix multiplication and an inner product. Matrix multiplication in general is not an inner product, since an inner product always associates two vectors with a scalar. That aside, my point was simply that there are several operations for which one may want to use the '*' operator, and there is only one '*'. :) -LarsThere are actually three (four) basic types of vector/matrix multiplication, and the * operator would more or less be fitting for any of them: - element-by-element multiplication, which is what * means now - dot product - matrix multiplication (- cross product )Actually, matrix multiplication and the dot product are both cases of the inner product and the cross product is specific to 3D vectors.
Jul 13 2009
Lars T. Kyllingstad wrote:Robert Jacques wrote:You could always convert to using downscode: a /inner/ b a /dot/ b a /cross/ b As evil as it is... there's something strangely alluring about being able to define your own infix operators...On Fri, 10 Jul 2009 06:24:04 -0400, Lars T. Kyllingstad <public kyllingen.nospamnet> wrote:Actually, the dot product is both a special case of matrix multiplication and an inner product. Matrix multiplication in general is not an inner product, since an inner product always associates two vectors with a scalar. That aside, my point was simply that there are several operations for which one may want to use the '*' operator, and there is only one '*'. :) -LarsThere are actually three (four) basic types of vector/matrix multiplication, and the * operator would more or less be fitting for any of them: - element-by-element multiplication, which is what * means now - dot product - matrix multiplication (- cross product )Actually, matrix multiplication and the dot product are both cases of the inner product and the cross product is specific to 3D vectors.
Jul 13 2009
Don wrote:The other thing that's desperately missing from D is multi-dimensional indexing.What are the limitations of multiple-argument []? Andrei
Jul 10 2009
On Fri, Jul 10, 2009 at 8:15 AM, Andrei Alexandrescu<SeeWebsiteForEmail erdani.org> wrote:Don wrote:I don't know what Don had in mind, but multi-dimensional slice is ugly. should be A[ a..b, c..d, e..f ] but currently you have to do something like A[ [a,c,e]..[b,d,f] ]. Or A[a..b][c..d][e..f] and bend over backwards to avoid creating too many temporary proxy objects. Also you'd like to be able to mix and match slices and indices like A[ a..b, 3, e..f ]. Also a basic built in multi-dim array would be nice. I think most of the time when people make a new T(10,20) they don't really want 10 arrays of length 20, each of which can be individually resized (and must be if more capacity is needed). They want one array with size 10 x 20. --bbThe other thing that's desperately missing from D is multi-dimensional indexing.What are the limitations of multiple-argument []? Andrei
Jul 10 2009
Bill Baxter wrote:On Fri, Jul 10, 2009 at 8:15 AM, Andrei Alexandrescu<SeeWebsiteForEmail erdani.org> wrote:Either, D needs built-in multidimensional arrays, or they must be implemented in Phobos. To support slicing of built-in multidim arrays, I think it is necessary to make slices a different type than arrays, because they would need to have a stride in order to slice across rows. -LarsDon wrote:I don't know what Don had in mind, but multi-dimensional slice is ugly. should be A[ a..b, c..d, e..f ] but currently you have to do something like A[ [a,c,e]..[b,d,f] ]. Or A[a..b][c..d][e..f] and bend over backwards to avoid creating too many temporary proxy objects. Also you'd like to be able to mix and match slices and indices like A[ a..b, 3, e..f ]. Also a basic built in multi-dim array would be nice. I think most of the time when people make a new T(10,20) they don't really want 10 arrays of length 20, each of which can be individually resized (and must be if more capacity is needed). They want one array with size 10 x 20. --bbThe other thing that's desperately missing from D is multi-dimensional indexing.What are the limitations of multiple-argument []? Andrei
Jul 13 2009
Bill Baxter wrote:On Fri, Jul 10, 2009 at 8:15 AM, Andrei Alexandrescu<SeeWebsiteForEmail erdani.org> wrote:And I don't think the latter option is necessarily correct. If A is an array type, A[a..b][c..d] currently means assert(d-c <= b-a), A[a+c..a+(d-c)], with $=b while evaluating d and c. Also you'd like to be able to mix and match slices andDon wrote:I don't know what Don had in mind, but multi-dimensional slice is ugly. should be A[ a..b, c..d, e..f ] but currently you have to do something like A[ [a,c,e]..[b,d,f] ]. Or A[a..b][c..d][e..f] and bend over backwards to avoid creating too many temporary proxy objects.The other thing that's desperately missing from D is multi-dimensional indexing.What are the limitations of multiple-argument []? Andreiindices like A[ a..b, 3, e..f ].This one, in particular, is what I had in mind.Also a basic built in multi-dim array would be nice. I think most of the time when people make a new T(10,20) they don't really want 10 arrays of length 20, each of which can be individually resized (and must be if more capacity is needed). They want one array with size 10 x 20.It's a bit trickier to provide that (other than just providing an operator-overloaded opIndex) since you'd then need to limit slicing on it, or else provide strided slices as a built-in type, as well. Maybe that's better as a library type. Dunno. Definitely we need a more powerful opIndex().
Jul 13 2009
On Fri, Jul 10, 2009 at 12:45 PM, Bill Baxter<wbaxter gmail.com> wrote:Also a basic built in multi-dim array would be nice. =A0I think most of the time when people make a new T(10,20) they don't really want 10 arrays of length 20, each of which can be individually resized (and must be if more capacity is needed). =A0 They want one array with size 10 x 20.http://msdn.microsoft.com/en-us/library/aa288453(VS.71).aspx If we were to D-ize them, they might look something like: int[,] a =3D new int[,](3, 4); a[,] =3D 5; // slice-assign entire array a[2,] =3D 10; // slice-assign only the third row for(int i =3D 0; i < a.length(0); i++) // notice .length(0) for(int j =3D 0; i < a.length(1); j++) writeln(a[i, j]); auto b =3D a.dup; a[0,] =3D b[1,]; // copy b's second row into a's first - hmm Slicing them might be tricky, because I'm not sure what type the slice should be. I think maybe a[0, ] should be int[], but what the heck would a[,0] be? int[,4] ? It'd require a lot more thought.
Jul 10 2009
On Fri, 10 Jul 2009 11:15:15 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Don wrote:Don might be refering to the $ operator, but that more applies to slicing. Personally, I find the lack of multi-dimentional slicing more of a lack: i.e. matrix[[1,2]..[matrix.lengths[0]-3,5]] vs matrix[1..$-3, 2..5]The other thing that's desperately missing from D is multi-dimensional indexing.What are the limitations of multiple-argument []? Andrei
Jul 10 2009
Lars T. Kyllingstad wrote:Don wrote:=2EAndrei Alexandrescu wrote:Walter Bright wrote:Indeed, I even think that the concept of operator overloading is wrong=Andrei Alexandrescu wrote:It's run its course like an old BMW. We need to do new things, and=20 bolting them on what we have won't work. AndreiThis also reminds me that the entire operator overloading feature=20 must be thrown away and redesigned.:-(In C++, operator overloading was just for syntax sugar. That's wrong: =pretty much everything that you want overloaded operators for, is=20 performance-critical. And that implies you need to deal on the level=20 of complete expressions, not individual operations.=20 =20 That is true. There is, for instance, a good reason why the basic BLAS =matrix multiplication routine calculates =20 a A B + b C (a,b: scalars; A,B,C: matrices) =20 instead of just AB. =20 Would/could one could gain something, performance-wise, by having such ="expression overloading" as a built-in feature of the language itself, =rather than as a library? =20 BLADE has already shown that it is possible to do stuff like this in a =library, but I think it goes without saying that if it was built into=20 the language the syntax could be made considerably nicer. Compare: =20 auto m =3D MatrixOp!("a*A*B + b*C")(aVal, bVal, aMtrx, bMtrx, cMtrx);==20 auto m =3D a*A*B + b*C; =20 If D could do this, I think it would become the next FORTRAN. :) =20 -LarsActually, this has already been done in C++:=20 http://flens.sourceforge.net/ It should be possible to port it to D... Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Jul 11 2009
"Jérôme M. Berger" wrote: (...)Someone correct me if I'm wrong, but I think what Blade does is a bit more advanced than FLENS. Blade performs optimizations on the AST level and generates (near) optimal assembly at compile time. I couldn't find info on what FLENS does exactly beyond inlining through template expressions, but from the looks of it it doesn't do any of the AST level optimizations Blade does. Anyone care to provide more info? Can Blade also generate better asm than is possible with libraries such as FLENS?BLADE has already shown that it is possible to do stuff like this in a library, but I think it goes without saying that if it was built into the language the syntax could be made considerably nicer. Compare: auto m = MatrixOp!("a*A*B + b*C")(aVal, bVal, aMtrx, bMtrx, cMtrx); auto m = a*A*B + b*C; If D could do this, I think it would become the next FORTRAN. :) -LarsActually, this has already been done in C++: http://flens.sourceforge.net/ It should be possible to port it to D... Jerome
Jul 11 2009
On Sat, 11 Jul 2009 06:14:56 -0400, Lutger <lutger.blijdestijn gmail.com> wrote:"Jérôme M. Berger" wrote: (...)Flens (and several other libraries like it) just provide a syntactic sugar for BLAS using expression templates. So calling something like a = b + c + d gets transformed into (if you're lucky) a = b + c; a = a + d; (and if you're unlucky) temp = b + c; a = temp + d; So you end up looping through memory multiple times. The next best option are expression templates, (Blitz or Boost come to mind) which encode the arguments of each operation into a struct. This results in a lot of temporaries and is a major performance hit for small-ish vectors, but you only loop through memory once and make no allocations, which is a win on larger vectors. Then you have BLADE and D array ops which don't create any temporaries and are faster still. The counter is that a BLAS library can be tuned for each specific CPU or GPU and select the fastest library at runtime.Someone correct me if I'm wrong, but I think what Blade does is a bit more advanced than FLENS. Blade performs optimizations on the AST level and generates (near) optimal assembly at compile time. I couldn't find info on what FLENS does exactly beyond inlining through template expressions, but from the looks of it it doesn't do any of the AST level optimizations Blade does. Anyone care to provide more info? Can Blade also generate better asm than is possible with libraries such as FLENS?BLADE has already shown that it is possible to do stuff like this in a library, but I think it goes without saying that if it was built into the language the syntax could be made considerably nicer. Compare: auto m = MatrixOp!("a*A*B + b*C")(aVal, bVal, aMtrx, bMtrx, cMtrx); auto m = a*A*B + b*C; If D could do this, I think it would become the next FORTRAN. :) -LarsActually, this has already been done in C++: http://flens.sourceforge.net/ It should be possible to port it to D... Jerome
Jul 12 2009
Robert Jacques wrote:On Sat, 11 Jul 2009 06:14:56 -0400, Lutger=20 <lutger.blijdestijn gmail.com> wrote: =20a"J=E9r=F4me M. Berger" wrote: (...)BLADE has already shown that it is possible to do stuff like this in=olibrary, but I think it goes without saying that if it was built int=x);the language the syntax could be made considerably nicer. Compare: auto m =3D MatrixOp!("a*A*B + b*C")(aVal, bVal, aMtrx, bMtrx, cMtr==2Eauto m =3D a*A*B + b*C; If D could do this, I think it would become the next FORTRAN. :) -LarsActually, this has already been done in C++: http://flens.sourceforge.net/ It should be possible to port it to D..=JeromeSomeone correct me if I'm wrong, but I think what Blade does is a bit =more advanced than FLENS. Blade performs optimizations on the AST level and=butgenerates (near) optimal assembly at compile time. I couldn't find=20 info on what FLENS does exactly beyond inlining through template expressions, =from the looks of it it doesn't do any of the AST level optimizations ==20Blade does. Anyone care to provide more info? Can Blade also generate better==3D=20asm than is possible with libraries such as FLENS?=20 Flens (and several other libraries like it) just provide a syntactic=20 sugar for BLAS using expression templates. So calling something like a =b + c + d gets transformed into (if you're lucky) a =3D b + c; a =3D a =+ d;=20(and if you're unlucky) temp =3D b + c; a =3D temp + d;The way I understand it, FLENS will never take the second solution.=20 If it cannot do the first with BLAS, it will fail to compile in=20 order to let you choose the best way to decompose your operation.So you end up looping through memory multiple times. The next best opti=on are=20expression templates, (Blitz or Boost come to mind) which encode the=20 arguments of each operation into a struct. This results in a lot of=20 temporaries and is a major performance hit for small-ish vectors, but=20 you only loop through memory once and make no allocations, which is a=20 win on larger vectors. Then you have BLADE and D array ops which don't =create any temporaries and are faster still. The counter is that a BLAS==20library can be tuned for each specific CPU or GPU and select the fastes=t=20library at runtime. =20Actually, depending on your architecture, amount of data and number=20 of operands, looping through memory multiple may in theory be the=20 best option in some cases. This is because it will use the cache=20 (and the swap) more efficiently than a solution that loops only once=20 through all arrays at the same time. Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Jul 12 2009
On Sun, 12 Jul 2009 11:14:02 -0400, Jérôme M. Berger <jeberger free.fr> wrote:Robert Jacques wrote:For arrays this is never, ever the case. Array ops are memory bandwidth bound. (Okay, with enough operands you could mess up the cache, but 100s of operands are pathological) What you're probably thinking about are the matrix multiply algorithms, but these algorithms inherently loop through memory multiple times. It's not something 'added' by the limitation of a fixed API.On Sat, 11 Jul 2009 06:14:56 -0400, Lutger <lutger.blijdestijn gmail.com> wrote:The way I understand it, FLENS will never take the second solution. If it cannot do the first with BLAS, it will fail to compile in order to let you choose the best way to decompose your operation."Jérôme M. Berger" wrote: (...)Flens (and several other libraries like it) just provide a syntactic sugar for BLAS using expression templates. So calling something like a = b + c + d gets transformed into (if you're lucky) a = b + c; a = a + d; (and if you're unlucky) temp = b + c; a = temp + d;Someone correct me if I'm wrong, but I think what Blade does is a bit more advanced than FLENS. Blade performs optimizations on the AST level and generates (near) optimal assembly at compile time. I couldn't find info on what FLENS does exactly beyond inlining through template expressions, but from the looks of it it doesn't do any of the AST level optimizations Blade does. Anyone care to provide more info? Can Blade also generate better asm than is possible with libraries such as FLENS?BLADE has already shown that it is possible to do stuff like this in a library, but I think it goes without saying that if it was built into the language the syntax could be made considerably nicer. Compare: auto m = MatrixOp!("a*A*B + b*C")(aVal, bVal, aMtrx, bMtrx, cMtrx); auto m = a*A*B + b*C; If D could do this, I think it would become the next FORTRAN. :) -LarsActually, this has already been done in C++: http://flens.sourceforge.net/ It should be possible to port it to D... JeromeSo you end up looping through memory multiple times. The next best option are expression templates, (Blitz or Boost come to mind) which encode the arguments of each operation into a struct. This results in a lot of temporaries and is a major performance hit for small-ish vectors, but you only loop through memory once and make no allocations, which is a win on larger vectors. Then you have BLADE and D array ops which don't create any temporaries and are faster still. The counter is that a BLAS library can be tuned for each specific CPU or GPU and select the fastest library at runtime.Actually, depending on your architecture, amount of data and number of operands, looping through memory multiple may in theory be the best option in some cases. This is because it will use the cache (and the swap) more efficiently than a solution that loops only once through all arrays at the same time.
Jul 12 2009
Don wrote:Indeed, I even think that the concept of operator overloading is wrong. In C++, operator overloading was just for syntax sugar. That's wrong: pretty much everything that you want overloaded operators for, is performance-critical. And that implies you need to deal on the level of complete expressions, not individual operations.I disagree for two reasons: 1. Operator syntax /is/ just syntax sugar for function calls, which is a concept orthogonal with expression-level optimization. Consider the following expressions: a + b + c add(add(a, b), c) If one of these is reduced to a single function call, both should be. 2. Expression-level optimization is the compiler's job, not the programmer's. -- Rainer Deyke - rainerd eldwood.com
Jul 10 2009
Reply to Rainer,2. Expression-level optimization is the compiler's job, not the programmer's.Some (many?) of the interesting optimizations that can be applied to expressions with overloaded operators are not algebraic in nature. Many of the use cases for operator overloading don't work if the compiler is allowed to assume all the same identities it can for scalars (e.g. the transitive identity doesn't hold for matrix math). Unless you wan to make a way for the programmer to both add and remove expression level optimizations, and require an optimization engine powerful enough to use them, you can't do optimization of expressions with overloaded operators purely in the compiler.
Jul 10 2009
BCS wrote:Reply to Rainer,I'm not sure what you mean by "algebraic" here, but...2. Expression-level optimization is the compiler's job, not the programmer's.Some (many?) of the interesting optimizations that can be applied to expressions with overloaded operators are not algebraic in nature.Many of the use cases for operator overloading don't work if the compiler is allowed to assume all the same identities it can for scalars (e.g. the transitive identity doesn't hold for matrix math).The compiler shouldn't make any assumptions. The compiler doesn't /need/ to make any assumptions, since the compiler has access to the actual code. For example, take a simple vector expression, with 3-element vectors: a = cross_product(b, c + d + e) First the compiler rewrites this to use a temporary variable: tmp = c + d tmp = tmp + e a = cross_product(b, tmp) Then the compiler inlines both functions and unrolls the loops: tmp[0] = c[0] + d[0] tmp[1] = c[1] + d[1] tmp[2] = c[2] + d[2] tmp[0] = tmp[0] + e[0] tmp[1] = tmp[1] + e[1] tmp[2] = tmp[2] + e[2] a[0] = b[1] * tmp[2] - b[2] - tmp[1] a[1] = b[2] * tmp[0] - b[0] - tmp[2] a[2] = b[0] * tmp[1] - b[1] - tmp[0] The compiler then reduces the temporary vector to a set of temporary scalars, probably stored in registers: tmp0 = c[0] + d[0] + e[0] tmp1 = c[1] + d[1] + e[1] tmp2 = c[2] + d[2] + e[2] a[0] = b[1] * tmp2 - b[2] - tmp1 a[1] = b[2] * tmp0 - b[0] - tmp2 a[2] = b[0] * tmp1 - b[1] - tmp0 The compiler sees no further optimizations, so it leaves the code as is. This as about as good as a human optimizer could have done. If the CPU has built-in support for vector cross-multiplication, the compiler can use this at the code generation stage.Unless you wan to make a way for the programmer to both add and remove expression level optimizations, and require an optimization engine powerful enough to use them, you can't do optimization of expressions with overloaded operators purely in the compiler.Why should the programmer need to give hints to the compiler when the compiler already has enough information to figure out how to optimize properly? -- Rainer Deyke - rainerd eldwood.com
Jul 10 2009
On Fri, Jul 10, 2009 at 10:18 PM, Rainer Deyke<rainerd eldwood.com> wrote:BCS wrote:CPUReply to Rainer,I'm not sure what you mean by "algebraic" here, but...2. Expression-level optimization is the compiler's job, not the programmer's.Some (many?) of the interesting optimizations that can be applied to expressions with overloaded operators are not algebraic in nature.Many of the use cases for operator overloading don't work if the compiler is allowed to assume all the same identities it can for scalars (e.g. the transitive identity doesn't hold for matrix math).The compiler shouldn't make any assumptions. =A0The compiler doesn't /need/ to make any assumptions, since the compiler has access to the actual code. =A0For example, take a simple vector expression, with 3-element vectors: =A0a =3D cross_product(b, c + d + e) First the compiler rewrites this to use a temporary variable: =A0tmp =3D c + d =A0tmp =3D tmp + e =A0a =3D cross_product(b, tmp) Then the compiler inlines both functions and unrolls the loops: =A0tmp[0] =3D c[0] + d[0] =A0tmp[1] =3D c[1] + d[1] =A0tmp[2] =3D c[2] + d[2] =A0tmp[0] =3D tmp[0] + e[0] =A0tmp[1] =3D tmp[1] + e[1] =A0tmp[2] =3D tmp[2] + e[2] =A0a[0] =3D b[1] * tmp[2] - b[2] - tmp[1] =A0a[1] =3D b[2] * tmp[0] - b[0] - tmp[2] =A0a[2] =3D b[0] * tmp[1] - b[1] - tmp[0] The compiler then reduces the temporary vector to a set of temporary scalars, probably stored in registers: =A0tmp0 =3D c[0] + d[0] + e[0] =A0tmp1 =3D c[1] + d[1] + e[1] =A0tmp2 =3D c[2] + d[2] + e[2] =A0a[0] =3D b[1] * tmp2 - b[2] - tmp1 =A0a[1] =3D b[2] * tmp0 - b[0] - tmp2 =A0a[2] =3D b[0] * tmp1 - b[1] - tmp0 The compiler sees no further optimizations, so it leaves the code as is. =A0This as about as good as a human optimizer could have done. =A0If the =has built-in support for vector cross-multiplication, the compiler can use this at the code generation stage.You're somewhat wrong on two points: 1) The compiler doesn't always have the code. If you're linking against a library, where the operator overloads are not given in the headers, the compiler doesn't have their code. Link-time optimization might be able to help here, but without the original syntax tree to work with, it's probably not going to be able to do the same things. 2) The compiler doesn't necessarily have enough information to optimize properly. Virtual calls destroy any guarantees. OK, sure, you could say "vector and matrix types really should be structs and therefore not use virtual calls," but in the general case, the compiler can't figure these things out. You're right, however, that in the vast majority of cases, the compiler *could* optimize calls to these functions. But you still might need some way of informing the compiler "hey! I really _really_ would like you to inline this."Unless you wan to make a way for the programmer to both add and remove expression level optimizations, and require an optimization engine powerful enough to use them, you can't do optimization of expressions with overloaded operators purely in the compiler.Why should the programmer need to give hints to the compiler when the compiler already has enough information to figure out how to optimize properly?
Jul 10 2009
Jarrett Billingsley wrote:1) The compiler doesn't always have the code. If you're linking against a library, where the operator overloads are not given in the headers, the compiler doesn't have their code. Link-time optimization might be able to help here, but without the original syntax tree to work with, it's probably not going to be able to do the same things. 2) The compiler doesn't necessarily have enough information to optimize properly. Virtual calls destroy any guarantees. OK, sure, you could say "vector and matrix types really should be structs and therefore not use virtual calls," but in the general case, the compiler can't figure these things out.Yes and no in both cases. Yes, the way way DMD is set up makes these kind of optimizations exceedingly difficult. Cross-library function calls and slow and virtual function calls are slow. I tend to assume that all performance-critical code is templated anyway, which is incompatible with both libraries and virtual functions. Better to move as many calculations as possible from run time to compile time. On the other hand, JIT compilers for fully dynamic languages can perform these types of optimizations. A clever D implementation could do the same.You're right, however, that in the vast majority of cases, the compiler *could* optimize calls to these functions. But you still might need some way of informing the compiler "hey! I really _really_ would like you to inline this."I'm not really opposed to the idea of optional optimization hints, but I'd like to see a compiler clever enough to not need them. -- Rainer Deyke - rainerd eldwood.com
Jul 10 2009
Hello Rainer,When that happens we can all retire. :)You're right, however, that in the vast majority of cases, the compiler *could* optimize calls to these functions. But you still might need some way of informing the compiler "hey! I really _really_ would like you to inline this."I'm not really opposed to the idea of optional optimization hints, but I'd like to see a compiler clever enough to not need them.
Jul 11 2009
Hello Rainer,BCS wrote:Algebraic optimizations would be optimizations that work by applying stuff you learned in algebra class.Reply to Rainer,I'm not sure what you mean by "algebraic" here, but...2. Expression-level optimization is the compiler's job, not the programmer's.Some (many?) of the interesting optimizations that can be applied to expressions with overloaded operators are not algebraic in nature.[...] Ok try another case: y = a*b + c*a; for scalers, the compiler can (and should) convert that 3 ops to 2: y = a*(b+c); for a matrix you can't. And I'm sure there are matrix cases where you can get improvements by doing algebraic manipulations before you inline that would be darn near impossible to get the compiler to figure out after you inline.Many of the use cases for operator overloading don't work if the compiler is allowed to assume all the same identities it can for scalars (e.g. the transitive identity doesn't hold for matrix math).The compiler shouldn't make any assumptions. The compiler doesn't /need/ to make any assumptions, since the compiler has access to the actual code. For example, take a simple vector expression, with 3-element vectors:Because the compiler doesn't have enough information. I've tried to work with systems that are supposed to do math all by them selves. They didn't work well. Computational Algebra Systems don't work without hints.Unless you wan to make a way for the programmer to both add and remove expression level optimizations, and require an optimization engine powerful enough to use them, you can't do optimization of expressions with overloaded operators purely in the compiler.Why should the programmer need to give hints to the compiler when the compiler already has enough information to figure out how to optimize properly?
Jul 11 2009
BCS wrote:Ok try another case: y = a*b + c*a; for scalers, the compiler can (and should) convert that 3 ops to 2: y = a*(b+c); for a matrix you can't. And I'm sure there are matrix cases where you can get improvements by doing algebraic manipulations before you inline that would be darn near impossible to get the compiler to figure out after you inline.There are two possible cases: - The algebraically manipulated expression is functionally equivalent to the original expression. In this case, the optimizer should produce the exact same code whether for both expressions, so the high-level algebraic manipulation is unnecessary. (Yes, I'm asking a lot from the optimizer.) - The algebraically manipulated expression is /not/ functionally equivalent to the original expression. In this case the algebraic manipulation is invalid, and should not be performed. Optimizing complex expressions is not algorithmically more difficult than optimizing simple expressions, it just takes more memory and CPU time. -- Rainer Deyke - rainerd eldwood.com
Jul 11 2009
Hello Rainer,BCS wrote:Now that's a bit of understatment :)Ok try another case: y = a*b + c*a; for scalers, the compiler can (and should) convert that 3 ops to 2: y = a*(b+c); for a matrix you can't. And I'm sure there are matrix cases where you can get improvements by doing algebraic manipulations before you inline that would be darn near impossible to get the compiler to figure out after you inline.There are two possible cases: - The algebraically manipulated expression is functionally equivalent to the original expression. In this case, the optimizer should produce the exact same code whether for both expressions, so the high-level algebraic manipulation is unnecessary. (Yes, I'm asking a lot from the optimizer.)- The algebraically manipulated expression is /not/ functionally equivalent to the original expression. In this case the algebraic manipulation is invalid, and should not be performed.Duh :)Optimizing complex expressions is not algorithmically more difficult than optimizing simple expressions, it just takes more memory and CPU time.My argument is coming from my understanding of what Andrei Alexandrescu was proposing. I think he is forwarding that operator overloading be done via AST macros. The end effect could be something like my backmath library in the it could manipulate expression under user control before the optimizer gets it crack at things.
Jul 11 2009
Curiously, in the DMD front-end, that's almost what happens with array operations. x[] = a*y[] + c*z[]; gets translated into: __somemangledarrayfuncname(x, y, z, a, c); and creates: __somemangledarrayfuncname(T[] p1, T[] p2, T[] p3, T c1, T c2) { for(int p=0; p < p1.length; ++p) { p1[p] = c1*p2[p] + c2*p3[p]; } }Wow, I didn't know that. I think such a behaviour is not advertised enough. It's valuable since making a single loop for such things optimizes cache usage and speed up things a lot. In C++ you would have to use a templated library like Eigen to do something, and then your code would become unreadable.
Jul 12 2009