D - F Language
- Mark Evans (17/17) Jul 01 2003 The only thing I wish to highlight is the 'slicing' and array manipulati...
- Sean L. Palmer (14/31) Jul 02 2003 I could sure use slicing functionality like this. Slices would have to ...
- Walter (3/5) Jul 02 2003 Would you like to take a stab at writing sum using SIMD?
- Sean L. Palmer (56/61) Jul 02 2003 I would be happy to. But it'll have to wait til at least tomorrow becau...
- Ilya Minkov (3/14) Jul 03 2003 hm... how about embedding something like this:
- Sean L. Palmer (8/10) Jul 03 2003 Wow, is all I can say.
- Mark Evans (5/5) Jul 03 2003 SIMD is not what the post was about. The semantics of array slicing and...
- Sean L. Palmer (8/13) Jul 03 2003 Right. It is however important to set up the semantics of the language ...
- Mark Evans (6/11) Jul 05 2003 OK, so how do back-end SIMD issues affect array semantics? All I gather...
- Sean L. Palmer (37/50) Jul 06 2003 Mostly I think what is important is that array operations are not expres...
- Mark Evans (6/6) Jul 07 2003 Yes the nice array semantics are desirable, but are tautology as an answ...
The only thing I wish to highlight is the 'slicing' and array manipulation, nothing beyond that necessarily. (FORTRAN is a column-major language, C is row-major; be careful.) http://www.fortran.com/F/linux.htm http://www.fortran.com/F/ "Array Language A sophisticated array language facilitates operations on whole arrays, contiguous and noncontiguous sections and slices of arrays. For example, arr(5:1:-2, 3, 6:) is a reference to the two-dimensional array created by taking the elements 5, 3, and 1 in the first dimension of arr and elements from 6 to the upper bound of the third dimension of arr, all in the 3rd plan of the array. If arr is a 5 by 6 by 7 array, the referenced elements would be (5,3,6), (3,3,6), (1,3,6), (5,3,7), (3,3,7), and (1,3,7). A simpler example shows calculating the sum inner product of a row and a column. A(i,j) = sum(B(i,:)*C(:,j)) Sum is one of the more than one hundred intrinsic procedures found in F."
Jul 01 2003
I could sure use slicing functionality like this. Slices would have to be capable of having a (possibly negative) stride for this to work Sum actually exists in Phobos/math2.d. ;) But it doesn't have an optimal implementation. We need SIMD in a bad way. Sean "Mark Evans" <Mark_member pathlink.com> wrote in message news:bdtrq6$1ohh$1 digitaldaemon.com...The only thing I wish to highlight is the 'slicing' and arraymanipulation,nothing beyond that necessarily. (FORTRAN is a column-major language, Cisrow-major; be careful.) http://www.fortran.com/F/linux.htm http://www.fortran.com/F/ "Array Language A sophisticated array language facilitates operations on whole arrays, contiguous and noncontiguous sections and slices of arrays. For example, arr(5:1:-2, 3, 6:) is a reference to the two-dimensional array created by taking the elements5, 3,and 1 in the first dimension of arr and elements from 6 to the upper boundofthe third dimension of arr, all in the 3rd plan of the array. If arr is a5 by 6by 7 array, the referenced elements would be (5,3,6), (3,3,6), (1,3,6),(5,3,7),(3,3,7), and (1,3,7). A simpler example shows calculating the sum inner product of a row and acolumn.A(i,j) = sum(B(i,:)*C(:,j)) Sum is one of the more than one hundred intrinsic procedures found in F."
Jul 02 2003
"Sean L. Palmer" <palmer.sean verizon.net> wrote in message news:bdv2h6$3vn$1 digitaldaemon.com...Sum actually exists in Phobos/math2.d. ;) But it doesn't have an optimal implementation. We need SIMD in a bad way.Would you like to take a stab at writing sum using SIMD?
Jul 02 2003
I would be happy to. But it'll have to wait til at least tomorrow because right now I had a few beers and am not coding so well. ;) sum isn't easy with SSE because there's no direct instruction support for it. But it can be done using roughly 1/4 the number of instructions required for FPU, if amortized over a large number of values. For small arrays, say, 4 elements, it still takes 2 add's and 2 shuffle instructions. Those would be required to sum up the result register at the end in any case. A library implementation would be sub-optimal for constant sized arrays. It'd be better off done as an intrinsic somewhat similar to how memcpy is usually implemented in professional grade compilers. Consider SSE code to Sum() a 16-byte aligned float[4] pointed to by eax, and store into a float pointed to by ebx: movaps xmm0, xmmword ptr [eax] ; xmm0 = x y z w movhlps xmm1, xmm0 ; xmm1 = z w ? ? addps xmm0, xmm1 ; xmm0 = (x+z) (y+w) ? ? shufps xmm1, xmm0, 1 ; xmm1 = (y+w) ? ? ? addss xmm0, xmm1 ; xmm0 = (x+y+z+w) ? ? ? movss dword ptr [ebx], xmm0 Obviously a canned library routine knowing nothing about the size or alignment will not be able to come even close to this kind of performance for such a small array. If the values start off and end up in registers, even better. Plus the optimizer can schedule the instructions wisely. The above has many dependency stall slots where other instructions can be inserted. Now to extend that to handle any sized array, you first sum up as much as possible in groups of 4 like so: mov eax, dword ptr [src] mov ecx, dword ptr [count] xorps xmm0,xmm0 ; xmm0 = 0 0 0 0 shr ecx,2 jz done sumloop: movaps xmm1, xmmword ptr [eax] ; obviously this can be unrolled and pipelined better add eax, 16 addps xmm0, xmm1 dec ecx jnz sumloop done: Ok, so maybe I actually can code a little even while drunk. ;) You know what they say: if you drink and code, statistically your program is more likely to crash. Then sum up the 4 partial results you're left with using the first code snippet, and handle any leftovers using movss/addss. You can use movss/addss to do the first part of a badly-aligned array as well, then once you find the alignment boundary, go to the fast aligned loop. I could definitely write up a somewhat less than optimal SSE or MMX implementation to get you started, but the above pretty much covers it. If you can write memcpy, you can write this. I guess lack of time is a major factor. Sean P.S. Whatever happened to all those MMX/SSE/3dNow functions Burton and I put together for you? "Walter" <walter digitalmars.com> wrote in message news:bdv7pb$9es$4 digitaldaemon.com..."Sean L. Palmer" <palmer.sean verizon.net> wrote in message news:bdv2h6$3vn$1 digitaldaemon.com...Sum actually exists in Phobos/math2.d. ;) But it doesn't have an optimal implementation. We need SIMD in a bad way.Would you like to take a stab at writing sum using SIMD?
Jul 02 2003
hm... how about embedding something like this: http://www.dcs.gla.ac.uk/~wpc/reports/compilers/compilerindex/x25.html In article <be0g0d$1itj$1 digitaldaemon.com>, Sean L. Palmer says...P.S. Whatever happened to all those MMX/SSE/3dNow functions Burton and I put together for you? "Walter" <walter digitalmars.com> wrote in message news:bdv7pb$9es$4 digitaldaemon.com..."Sean L. Palmer" <palmer.sean verizon.net> wrote in message news:bdv2h6$3vn$1 digitaldaemon.com...Sum actually exists in Phobos/math2.d. ;) But it doesn't have an optimal implementation. We need SIMD in a bad way.Would you like to take a stab at writing sum using SIMD?
Jul 03 2003
Wow, is all I can say. Why don't the compilers I have to use have this kind of stuff? I'm living in the friggin stone age. That might be enough to make me switch back to Pascal! ;) Nah, I would miss generics too much. Sean "Ilya Minkov" <Ilya_member pathlink.com> wrote in message news:be179r$2bk6$1 digitaldaemon.com...hm... how about embedding something like this: http://www.dcs.gla.ac.uk/~wpc/reports/compilers/compilerindex/x25.html
Jul 03 2003
SIMD is not what the post was about. The semantics of array slicing and 'views' were the motivation. That is what D needs fixed 'in a bad way.' It is premature to worry about SIMD and it's a back-end issue anyway, not important to the language as such. Mark
Jul 03 2003
Right. It is however important to set up the semantics of the language in such a way that efficient SIMD code generation is possible. Whether or not the early compilers emit SIMD, powerful, expressive array handling would be a huge plus in the long run. Sean "Mark Evans" <Mark_member pathlink.com> wrote in message news:be1n9b$2tn9$1 digitaldaemon.com...SIMD is not what the post was about. The semantics of array slicing and'views'were the motivation. That is what D needs fixed 'in a bad way.' It is premature to worry about SIMD and it's a back-end issue anyway, not important to the language as such. Mark
Jul 03 2003
Sean L. Palmer says...Right. It is however important to set up the semantics of the language in such a way that efficient SIMD code generation is possible. Whether or not the early compilers emit SIMD, powerful, expressive array handling would be a huge plus in the long run. SeanOK, so how do back-end SIMD issues affect array semantics? All I gather from your comments is that list operations should decompose into 4-number groups. Is that a killer? The F people have paid much attention to low-level performance issues such as address alignment. I do not see big problems ahead. Mark
Jul 05 2003
Mostly I think what is important is that array operations are not expressed in terms of iterative loops, but by more implicit, inherently parallel methods. Otherwise the compiler has to reconstruct the parallel form from the iterative form, which is difficult and painful, something not all compilers would be able to do well. So we need the ability to do operations on entire array slices at a time, to do operations on array slices vs. array slices, to compute functions on all array elements and store the results as an array, that kind of thing. a[] += b[] * c; is in the D spec but is currently unimplemented. F probably has done it right. The best thing I saw before that for array manipulation was Yorick ( http://wuarchive.wustl.edu/languages/yorick/doc/index.html ) but Yorick is intended for interpretation; it seems to be headed in the wrong direction. 4-number groups doesn't really have anything to do with it. If the arrays in question happen to be multiples of 4, leftovers operations will not be necessary, but you can SIMD-ify any size array operation. SIMD will only help (after the break-even point which is probably about size 3 or 4) and the larger the array the more it will help. From my experiences, once the array gets big enough that it won't all fit in the processor cache, performance drops off dramatically, (memory bandwidth costs start to dominate) but still is a win over FPU. An intelligent SIMD backend could in fact SIMD-ify any loop, not just array operations, by unrolling the loop 4 times and having each component of a SIMD vector contain a variable from one iteration of the loop. So it executes 4 iterations at a time. It could do this for any loop that doesn't have much branching, doesn't depend on the results of prior iterations, or for which the compiler can predict and control such dependency. But that has nothing whatsoever to do with array semantics. Sean "Mark Evans" <Mark_member pathlink.com> wrote in message news:be83ou$38p$1 digitaldaemon.com...Sean L. Palmer says...inRight. It is however important to set up the semantics of the languagenotsuch a way that efficient SIMD code generation is possible. Whether orbethe early compilers emit SIMD, powerful, expressive array handling wouldfroma huge plus in the long run. SeanOK, so how do back-end SIMD issues affect array semantics? All I gatheryour comments is that list operations should decompose into 4-numbergroups. Isthat a killer? The F people have paid much attention to low-levelperformanceissues such as address alignment. I do not see big problems ahead. Mark
Jul 06 2003
Yes the nice array semantics are desirable, but are tautology as an answer to the question. The original worry was that array semantics might prohibit SIMD. Now the claim is that SIMD has no impact on array semantics. Incidentally, a Google search on "array language" turns up many interesting links. Mark
Jul 07 2003