www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - Re: DMD 1.034 and 2.018 releases

reply Pete <example example.com> writes:
Walter Bright Wrote:

 This one has (finally) got array operations implemented. For those who 
 want to show off their leet assembler skills, the initial assembler 
 implementation code is in phobos/internal/array*.d. Burton Radons wrote 
 the assembler. Can you make it faster?
 
 http://www.digitalmars.com/d/1.0/changelog.html
 http://ftp.digitalmars.com/dmd.1.034.zip
 
 http://www.digitalmars.com/d/2.0/changelog.html
 http://ftp.digitalmars.com/dmd.2.018.zip

Not sure if someone else has already mentioned this but would it be possible for the compiler to align these arrays on 16 byte boundaries in order to maximise any possible vector efficiency. AFAIK you can't actually specify align anything higher than align 8 at the moment which is a bit of a problem. Regards,
Aug 11 2008
next sibling parent Walter Bright <newshound1 digitalmars.com> writes:
Pete wrote:
 Not sure if someone else has already mentioned this but would it be
 possible for the compiler to align these arrays on 16 byte boundaries
 in order to maximise any possible vector efficiency. AFAIK you can't
 actually specify align anything higher than align 8 at the moment
 which is a bit of a problem.

Anything allocated with new will be aligned on 16 byte boundaries.
Aug 11 2008
prev sibling parent reply Georg Lukas <georg op-co.de> writes:
On Mon, 11 Aug 2008 09:55:26 -0400, Pete wrote:
 Walter Bright Wrote:
 This one has (finally) got array operations implemented. For those who
 want to show off their leet assembler skills, the initial assembler
 implementation code is in phobos/internal/array*.d. Burton Radons wrote
 the assembler. Can you make it faster?

Not sure if someone else has already mentioned this but would it be possible for the compiler to align these arrays on 16 byte boundaries in order to maximise any possible vector efficiency. AFAIK you can't actually specify align anything higher than align 8 at the moment which is a bit of a problem.

From a short look at the array*.d source code, it would be better to check if source and destination have the same alignment, i.e.: a = 0xf00d0013 (3 mod 16) b = 0xdeaffff3 (3 mod 16) In that case, the first 16-3 = 13 bytes can be handled using regular D code, and the aligned SSE version can be used for the rest. This would also work for slices, at least when both slices have the same alignment remainder. I'm just not sure what overhead such a solution would impose for small arrays. Georg -- || http://op-co.de ++ GCS/CM d? s: a-- C+++ UL+++ !P L+++ E--- W++ ++ || gpg: 0x962FD2DE || N++ o? K- w---() O M V? PS+ PE-- Y+ PGP++ t* || || Ge0rG: euIRCnet || 5 X+ R tv b+(+++) DI+(+++) D+ G e* h! r* !y+ || ++ IRCnet OFTC OPN ||________________________________________________||
Aug 12 2008
next sibling parent reply Don <nospam nospam.com.au> writes:
Georg Lukas wrote:
 On Mon, 11 Aug 2008 09:55:26 -0400, Pete wrote:
 Walter Bright Wrote:
 This one has (finally) got array operations implemented. For those who
 want to show off their leet assembler skills, the initial assembler
 implementation code is in phobos/internal/array*.d. Burton Radons wrote
 the assembler. Can you make it faster?

possible for the compiler to align these arrays on 16 byte boundaries in order to maximise any possible vector efficiency. AFAIK you can't actually specify align anything higher than align 8 at the moment which is a bit of a problem.

From a short look at the array*.d source code, it would be better to check if source and destination have the same alignment, i.e.: a = 0xf00d0013 (3 mod 16) b = 0xdeaffff3 (3 mod 16) In that case, the first 16-3 = 13 bytes can be handled using regular D code, and the aligned SSE version can be used for the rest. This would also work for slices, at least when both slices have the same alignment remainder. I'm just not sure what overhead such a solution would impose for small arrays.

Just begin with a check for minimal size. If less than that size, don't use SSE at all.
 
 Georg

Aug 13 2008
parent "Dave" <Dave_member pathlink.com> writes:
"Don" <nospam nospam.com.au> wrote in message 
news:g7u36h$20j0$1 digitalmars.com...
 Georg Lukas wrote:
 On Mon, 11 Aug 2008 09:55:26 -0400, Pete wrote:
 Walter Bright Wrote:
 This one has (finally) got array operations implemented. For those who
 want to show off their leet assembler skills, the initial assembler
 implementation code is in phobos/internal/array*.d. Burton Radons wrote
 the assembler. Can you make it faster?

possible for the compiler to align these arrays on 16 byte boundaries in order to maximise any possible vector efficiency. AFAIK you can't actually specify align anything higher than align 8 at the moment which is a bit of a problem.

From a short look at the array*.d source code, it would be better to check if source and destination have the same alignment, i.e.: a = 0xf00d0013 (3 mod 16) b = 0xdeaffff3 (3 mod 16) In that case, the first 16-3 = 13 bytes can be handled using regular D code, and the aligned SSE version can be used for the rest.


Good idea. Right now in that code there is (usually) a case for both un/aligned. It typically goes like this: if(cpu_has_sse2 && a.length > min_size) { if(((cast(size_t) aptr | cast(size_t)bptr | cast(size_t)cptr) & 15) != 0) { // Unaligned case asm { ... movdqu XMM0, [EAX] ... } } else { // Aligned case asm { ... movdqa XMM0, [EAX] ... } } } The two blocks of asm code is basically identical except for the un/aligned SSE opcodes. With your idea, one could get rid of the test for alignment, probably some bloat and a whole lot of duplication. I guess the question would be if the overhead of your idea would be less than the current design. - Dave
 This would also work for slices, at least when both slices have the same 
 alignment remainder. I'm just not sure what overhead such a solution 
 would impose for small arrays.

Just begin with a check for minimal size. If less than that size, don't use SSE at all.
 Georg 


Aug 13 2008
prev sibling parent JAnderson <ask me.com> writes:
Georg Lukas wrote:
 On Mon, 11 Aug 2008 09:55:26 -0400, Pete wrote:
 Walter Bright Wrote:
 This one has (finally) got array operations implemented. For those who
 want to show off their leet assembler skills, the initial assembler
 implementation code is in phobos/internal/array*.d. Burton Radons wrote
 the assembler. Can you make it faster?

possible for the compiler to align these arrays on 16 byte boundaries in order to maximise any possible vector efficiency. AFAIK you can't actually specify align anything higher than align 8 at the moment which is a bit of a problem.

From a short look at the array*.d source code, it would be better to check if source and destination have the same alignment, i.e.: a = 0xf00d0013 (3 mod 16) b = 0xdeaffff3 (3 mod 16) In that case, the first 16-3 = 13 bytes can be handled using regular D code, and the aligned SSE version can be used for the rest. This would also work for slices, at least when both slices have the same alignment remainder. I'm just not sure what overhead such a solution would impose for small arrays.

There would be some overhead for small arrays however as I said in my previous email, if your using a small array then its likely that your not doing much. If it is a performance issue you should switch to a larger array (by grouping all your smaller ones together). Of course there's the edge case where some actually needs to do a g-billion operations on exactly the same small array.
 
 Georg

-Joel
Aug 13 2008