D - F Language

Mark Evans (17/17) Jul 01 2003 The only thing I wish to highlight is the 'slicing' and array manipulati...

Sean L. Palmer (14/31) Jul 02 2003 I could sure use slicing functionality like this. Slices would have to ...

Walter (3/5) Jul 02 2003 Would you like to take a stab at writing sum using SIMD?

Sean L. Palmer (56/61) Jul 02 2003 I would be happy to. But it'll have to wait til at least tomorrow becau...

Ilya Minkov (3/14) Jul 03 2003 hm... how about embedding something like this:

Sean L. Palmer (8/10) Jul 03 2003 Wow, is all I can say.

Mark Evans (5/5) Jul 03 2003 SIMD is not what the post was about. The semantics of array slicing and...

Sean L. Palmer (8/13) Jul 03 2003 Right. It is however important to set up the semantics of the language ...

Mark Evans (6/11) Jul 05 2003 OK, so how do back-end SIMD issues affect array semantics? All I gather...

Sean L. Palmer (37/50) Jul 06 2003 Mostly I think what is important is that array operations are not expres...

Mark Evans (6/6) Jul 07 2003 Yes the nice array semantics are desirable, but are tautology as an answ...

Mark Evans <Mark_member pathlink.com> writes:

The only thing I wish to highlight is the 'slicing' and array manipulation,
nothing beyond that necessarily.  (FORTRAN is a column-major language, C is
row-major; be careful.)

http://www.fortran.com/F/linux.htm
http://www.fortran.com/F/



"Array Language

A sophisticated array language facilitates operations on whole arrays,
contiguous and noncontiguous sections and slices of arrays. For example, 

arr(5:1:-2, 3, 6:)

is a reference to the two-dimensional array created by taking the elements 5, 3,
and 1 in the first dimension of arr and elements from 6 to the upper bound of
the third dimension of arr, all in the 3rd plan of the array. If arr is a 5 by 6
by 7 array, the referenced elements would be (5,3,6), (3,3,6), (1,3,6), (5,3,7),
(3,3,7), and (1,3,7). 

A simpler example shows calculating the sum inner product of a row and a column.


A(i,j) = sum(B(i,:)*C(:,j))

Sum is one of the more than one hundred intrinsic procedures found in F."

Jul 01 2003

"Sean L. Palmer" <palmer.sean verizon.net> writes:

I could sure use slicing functionality like this.  Slices would have to be
capable of having a (possibly negative) stride for this to work

Sum actually exists in Phobos/math2.d.    ;)    But it doesn't have an
optimal implementation.  We need SIMD in a bad way.

Sean

"Mark Evans" <Mark_member pathlink.com> wrote in message
news:bdtrq6$1ohh$1 digitaldaemon.com...
 The only thing I wish to highlight is the 'slicing' and array

manipulation,
 nothing beyond that necessarily.  (FORTRAN is a column-major language, C

is
 row-major; be careful.)

 http://www.fortran.com/F/linux.htm
 http://www.fortran.com/F/



 "Array Language

 A sophisticated array language facilitates operations on whole arrays,
 contiguous and noncontiguous sections and slices of arrays. For example,

 arr(5:1:-2, 3, 6:)

 is a reference to the two-dimensional array created by taking the elements

5, 3,
 and 1 in the first dimension of arr and elements from 6 to the upper bound

of
 the third dimension of arr, all in the 3rd plan of the array. If arr is a

5 by 6
 by 7 array, the referenced elements would be (5,3,6), (3,3,6), (1,3,6),

(5,3,7),
 (3,3,7), and (1,3,7).

 A simpler example shows calculating the sum inner product of a row and a

column.
 A(i,j) = sum(B(i,:)*C(:,j))

 Sum is one of the more than one hundred intrinsic procedures found in F."

Jul 02 2003

"Walter" <walter digitalmars.com> writes:

"Sean L. Palmer" <palmer.sean verizon.net> wrote in message
news:bdv2h6$3vn$1 digitaldaemon.com...
 Sum actually exists in Phobos/math2.d.    ;)    But it doesn't have an
 optimal implementation.  We need SIMD in a bad way.

Would you like to take a stab at writing sum using SIMD?

Jul 02 2003

"Sean L. Palmer" <palmer.sean verizon.net> writes:

I would be happy to.  But it'll have to wait til at least tomorrow because
right now I had a few beers and am not coding so well.  ;)

sum isn't easy with SSE because there's no direct instruction support for
it.  But it can be done using roughly 1/4 the number of instructions
required for FPU, if amortized over a large number of values.

For small arrays, say, 4 elements, it still takes 2 add's and 2 shuffle
instructions.  Those would be required to sum up the result register at the
end in any case.

A library implementation would be sub-optimal for constant sized arrays.
It'd be better off done as an intrinsic somewhat similar to how memcpy is
usually implemented in professional grade compilers.

Consider SSE code to Sum() a 16-byte aligned float[4] pointed to by eax, and
store into a float pointed to by ebx:

movaps xmm0, xmmword ptr [eax]   ; xmm0 = x y z w
movhlps xmm1, xmm0                      ; xmm1 = z w ? ?
addps xmm0, xmm1                         ; xmm0 = (x+z) (y+w) ? ?
shufps xmm1, xmm0, 1                     ; xmm1 = (y+w) ? ? ?
addss xmm0, xmm1                         ; xmm0 = (x+y+z+w) ? ? ?
movss dword ptr [ebx], xmm0

Obviously a canned library routine knowing nothing about the size or
alignment will not be able to come even close to this kind of performance
for such a small array.  If the values start off and end up in registers,
even better.  Plus the optimizer can schedule the instructions wisely.  The
above has many dependency stall slots where other instructions can be
inserted.

Now to extend that to handle any sized array, you first sum up as much as
possible in groups of 4 like so:

mov eax, dword ptr [src]
mov ecx, dword ptr [count]
xorps xmm0,xmm0                    ; xmm0 = 0 0 0 0
shr ecx,2
jz done
sumloop:
movaps xmm1, xmmword ptr [eax]   ; obviously this can be unrolled and
pipelined better
add eax, 16
addps xmm0, xmm1
dec ecx
jnz sumloop
done:

Ok, so maybe I actually can code a little even while drunk.  ;)   You know
what they say:  if you drink and code, statistically your program is more
likely to crash.

Then sum up the 4 partial results you're left with using the first code
snippet, and handle any leftovers using movss/addss.  You can use
movss/addss to do the first part of a badly-aligned array as well, then once
you find the alignment boundary, go to the fast aligned loop.

I could definitely write up a somewhat less than optimal SSE or MMX
implementation to get you started, but the above pretty much covers it.  If
you can write memcpy, you can write this.  I guess lack of time is a major
factor.

Sean

P.S. Whatever happened to all those MMX/SSE/3dNow functions Burton and I put
together for you?

"Walter" <walter digitalmars.com> wrote in message
news:bdv7pb$9es$4 digitaldaemon.com...
 "Sean L. Palmer" <palmer.sean verizon.net> wrote in message
 news:bdv2h6$3vn$1 digitaldaemon.com...
 Sum actually exists in Phobos/math2.d.    ;)    But it doesn't have an
 optimal implementation.  We need SIMD in a bad way.

 Would you like to take a stab at writing sum using SIMD?

Jul 02 2003

Ilya Minkov <Ilya_member pathlink.com> writes:

hm... how about embedding something like this:

http://www.dcs.gla.ac.uk/~wpc/reports/compilers/compilerindex/x25.html


In article <be0g0d$1itj$1 digitaldaemon.com>, Sean L. Palmer says...
P.S. Whatever happened to all those MMX/SSE/3dNow functions Burton and I put
together for you?

"Walter" <walter digitalmars.com> wrote in message
news:bdv7pb$9es$4 digitaldaemon.com...
 "Sean L. Palmer" <palmer.sean verizon.net> wrote in message
 news:bdv2h6$3vn$1 digitaldaemon.com...
 Sum actually exists in Phobos/math2.d.    ;)    But it doesn't have an
 optimal implementation.  We need SIMD in a bad way.

 Would you like to take a stab at writing sum using SIMD?

Jul 03 2003

"Sean L. Palmer" <palmer.sean verizon.net> writes:

Wow, is all I can say.

Why don't the compilers I have to use have this kind of stuff?  I'm living
in the friggin stone age.

That might be enough to make me switch back to Pascal!  ;)

Nah, I would miss generics too much.

Sean

"Ilya Minkov" <Ilya_member pathlink.com> wrote in message
news:be179r$2bk6$1 digitaldaemon.com...
 hm... how about embedding something like this:

 http://www.dcs.gla.ac.uk/~wpc/reports/compilers/compilerindex/x25.html

Jul 03 2003

Mark Evans <Mark_member pathlink.com> writes:

SIMD is not what the post was about.  The semantics of array slicing and 'views'
were the motivation.  That is what D needs fixed 'in a bad way.'

It is premature to worry about SIMD and it's a back-end issue anyway, not
important to the language as such.

Mark

Jul 03 2003

"Sean L. Palmer" <palmer.sean verizon.net> writes:

Right.  It is however important to set up the semantics of the language in
such a way that efficient SIMD code generation is possible.  Whether or not
the early compilers emit SIMD, powerful, expressive array handling would be
a huge plus in the long run.

Sean

"Mark Evans" <Mark_member pathlink.com> wrote in message
news:be1n9b$2tn9$1 digitaldaemon.com...
 SIMD is not what the post was about.  The semantics of array slicing and

'views'
 were the motivation.  That is what D needs fixed 'in a bad way.'

 It is premature to worry about SIMD and it's a back-end issue anyway, not
 important to the language as such.

 Mark

Jul 03 2003

Mark Evans <Mark_member pathlink.com> writes:

Sean L. Palmer says...
Right.  It is however important to set up the semantics of the language in
such a way that efficient SIMD code generation is possible.  Whether or not
the early compilers emit SIMD, powerful, expressive array handling would be
a huge plus in the long run.

Sean

OK, so how do back-end SIMD issues affect array semantics?  All I gather from
your comments is that list operations should decompose into 4-number groups.  Is
that a killer?  The F people have paid much attention to low-level performance
issues such as address alignment.  I do not see big problems ahead.

Mark

Jul 05 2003

"Sean L. Palmer" <palmer.sean verizon.net> writes:

Mostly I think what is important is that array operations are not expressed
in terms of iterative loops, but by more implicit, inherently parallel
methods.  Otherwise the compiler has to reconstruct the parallel form from
the iterative form, which is difficult and painful, something not all
compilers would be able to do well.  So we need the ability to do operations
on entire array slices at a time, to do operations on array slices vs. array
slices, to compute functions on all array elements and store the results as
an array, that kind of thing.  a[] += b[] * c;  is in the D spec but is
currently unimplemented.

F probably has done it right.  The best thing I saw before that for array
manipulation was Yorick (
http://wuarchive.wustl.edu/languages/yorick/doc/index.html ) but Yorick is
intended for interpretation;  it seems to be headed in the wrong direction.

4-number groups doesn't really have anything to do with it.  If the arrays
in question happen to be multiples of 4, leftovers operations will not be
necessary, but you can SIMD-ify any size array operation.  SIMD will only
help (after the break-even point which is probably about size 3 or 4) and
the larger the array the more it will help.

From my experiences, once the array gets big enough that it won't all fit in
the processor cache, performance drops off dramatically, (memory bandwidth
costs start to dominate) but still is a win over FPU.

An intelligent SIMD backend could in fact SIMD-ify any loop, not just array
operations, by unrolling the loop 4 times and having each component of a
SIMD vector contain a variable from one iteration of the loop.  So it
executes 4 iterations at a time.  It could do this for any loop that doesn't
have much branching,  doesn't depend on the results of prior iterations, or
for which the compiler can predict and control such dependency.  But that
has nothing whatsoever to do with array semantics.

Sean

"Mark Evans" <Mark_member pathlink.com> wrote in message
news:be83ou$38p$1 digitaldaemon.com...
 Sean L. Palmer says...
Right.  It is however important to set up the semantics of the language


in
such a way that efficient SIMD code generation is possible.  Whether or


not
the early compilers emit SIMD, powerful, expressive array handling would


be
a huge plus in the long run.

Sean

 OK, so how do back-end SIMD issues affect array semantics?  All I gather

from
 your comments is that list operations should decompose into 4-number

groups.  Is
 that a killer?  The F people have paid much attention to low-level

performance
 issues such as address alignment.  I do not see big problems ahead.

 Mark

Jul 06 2003

Mark Evans <Mark_member pathlink.com> writes:

Yes the nice array semantics are desirable, but are tautology as an answer to
the question.  The original worry was that array semantics might prohibit SIMD.
Now the claim is that SIMD has no impact on array semantics.

Incidentally, a Google search on "array language" turns up many interesting
links.

Mark

Jul 07 2003

D Programming

C/C++ Programming

Other

D - F Language