digitalmars.D.ldc - Looking for more library optimization patterns
- Kai Nacke (11/11) Jan 29 2014 Hi all!
- bearophile (31/40) Jan 30 2014 A de-optimization I'd really like in LDC (and dmd) is to not call
- Stanislav Blinov (7/7) Jan 30 2014 Being somewhat involved in aforementioned thread, I second that.
- Kai Nacke (6/6) Feb 10 2014 Thanks! That are good hints!
- Andrea Fontana (5/11) Feb 11 2014 Don't know if it is the case but what about using appender
- bearophile (6/7) Feb 11 2014 There are plenty of them to find. But I think you need a strategy
- Ivan Kazmenko (12/22) Feb 15 2014 For short loops, an unrolled version like
- bearophile (5/14) Feb 15 2014 Yes, but ldc is plenty able to unroll small loops with length
- Temtaime (5/5) Apr 18 2014 What about SSE?
- safety0ff (7/10) Jul 30 2014 I just learned about this issue while looking at
Hi all! LDC includes the SimplifyDRuntimeCalls pass which should replace D library calls with more efficient ones. This is a clone of a former LLVM pass which did the same for C runtime calls. An example is that printf("foo\n") is replaced by puts("foo"). Do you know of similar patterns in D which are worth to be implemented? E.g. replacing std.stdio.writefln("foo") with std.stdio.writeln("foo") Regards, Kai
Jan 29 2014
Kai Nacke:LDC includes the SimplifyDRuntimeCalls pass which should replace D library calls with more efficient ones. This is a clone of a former LLVM pass which did the same for C runtime calls. An example is that printf("foo\n") is replaced by puts("foo"). Do you know of similar patterns in D which are worth to be implemented? E.g. replacing std.stdio.writefln("foo") with std.stdio.writeln("foo")A de-optimization I'd really like in LDC (and dmd) is to not call the run-time functions when you perform a verctor operation on arrays statically known to be very short: void main() { int[3] a, b; a[] += b[]; } And just rewrite that with a for loop (as done for cases where the run time function is not available), and let ldc2 compile it on its own. See a thread in D.learn: http://forum.dlang.org/thread/lqmqsnucadaqlkxkoffc forum.dlang.org I'd like to replace code like: size_t i = 0; foreach (immutable j, const ref bj; bodies) { foreach (const ref bk; bodies[j + 1 .. $]) { foreach (immutable m; TypeTuple!(0, 1, 2)) r[i][m] = bj.x[m] - bk.x[m]; i++; } } With: size_t i = 0; foreach (immutable j, const ref bj; bodies) foreach (const ref bk; bodies[j + 1 .. $]) r[i++][] = bj.x[] - bk.x[]; And keep the same performance, instead of seeing a significant slowdown. Bye, bearophile
Jan 30 2014
Being somewhat involved in aforementioned thread, I second that. There seems to be no logical reason to insert a call dSliceOpAssignOrWhateverItIsCalled when all the data is actually there. At least, in -release -O3 builds :) Granted, this may be tricky for arbitrarily-typed arrays, but for builtins with statically known array sizes? That's a definitive win.
Jan 30 2014
Thanks! That are good hints! I am not sure if I can implement them soon. But I put them on my list for the pass. If you have more of those hints: please post! Regards, Kai
Feb 10 2014
On Monday, 10 February 2014 at 19:51:33 UTC, Kai Nacke wrote:Thanks! That are good hints! I am not sure if I can implement them soon. But I put them on my list for the pass. If you have more of those hints: please post! Regards, KaiDon't know if it is the case but what about using appender instead of concatenation (for example for concatenations inside loops...)? Andrea
Feb 11 2014
Kai Nacke:If you have more of those hints: please post!There are plenty of them to find. But I think you need a strategy like this: http://en.wikipedia.org/wiki/Brainstorming to find them. D conferences should be good for this purpose. Bye, bearophile
Feb 11 2014
On Thursday, 30 January 2014 at 18:36:09 UTC, bearophile wrote:A de-optimization I'd really like in LDC (and dmd) is to not call the run-time functions when you perform a verctor operation on arrays statically known to be very short: void main() { int[3] a, b; a[] += b[]; } And just rewrite that with a for loop (as done for cases where the run time function is not available), and let ldc2 compile it on its own.For short loops, an unrolled version like a[0] += b[0]; a[1] += b[1]; a[2] += b[2]; may well be faster than a simple loop as the following one: foreach (immutable i; 0..3) { a[i] += b[i]; } At least on x86/64. Will that optimization happen too, at a later stage? Ivan Kazmenko.
Feb 15 2014
Ivan Kazmenko:For short loops, an unrolled version like a[0] += b[0]; a[1] += b[1]; a[2] += b[2]; may well be faster than a simple loop as the following one: foreach (immutable i; 0..3) { a[i] += b[i]; } At least on x86/64.Yes, but ldc is plenty able to unroll small loops with length known at compile time. Bye, bearophile
Feb 15 2014
What about SSE? Can float[16] a, b; a[] += b[]; Be optimized by using only 4 addps ?
Apr 18 2014
On Thursday, 30 January 2014 at 18:36:09 UTC, bearophile wrote:A de-optimization I'd really like in LDC (and dmd) is to not call the run-time functions when you perform a verctor operation on arrays statically known to be very short:I just learned about this issue while looking at http://rosettacode.org/wiki/Hamming_numbers#D (Alternative version 2, opEquals) It makes me sad to learn that slice / array operations get turned into monstrosities by the compiler. There are some huge wins to be had in this area.
Jul 30 2014