digitalmars.D - Array operations, C#, etc
- bearophile (35/35) Nov 03 2008 Mono for gaming and higher performance:
Mono for gaming and higher performance: http://tirania.org/tmp/PC54-slides-as-pdf.pdf Link coming from this blog post: http://tirania.org/blog/archive/2008/Nov-03.html scripting languages like Lua/Python for IA in games (D isn't listed there, maybe they think D is a dinosaur like C++). Near the end the slide set also shows the the approach taken by mono to use the SIMD instructions of the CPU, defining many types like: Mono.Simd.Vector16b - 16 unsigned bytes Mono.Simd.Vector16sb - 16 signed bytes Mono.Simd.Vector2d - 2 doubles Mono.Simd.Vector2l - 2 signed 64-bit longs Mono.Simd.Vector2ul - 2 unsigned 64-bit longs Mono.Simd.Vector4f - 4 floats etc... Operations on them become translated as SIMD instructions. D instead augments all arrays with array operations, but then it also has to manage the cases where lengths aren't exact multiples of the MMX registers. I think that such length management is done at runtime, so you have to pay a little price if you have just few items, like 4 floats, that for example you don't pay using Mono.Simd.Vector4f. When array sizes are known at compile time and fixed, like in this situation: void main() { float[4] a = [1.0, 2.0, 3.0, 4.0]; float[4] b = [10.0, 20.0, 30.0, 40.0]; float[4] s; s[] = a[] + b[]; } The compiler, into the arrayfloat._arraySliceSliceAddSliceAssign_f() function can use compile-time information to runtime length controls & fallbacks, using some static ifs. With the purpose is to make it produce only the naked instuctions in that case (I may write a little benchmark in D with inlined ASM to compare the speed of the s[]=a[]+b[] line): movups (%eax),%xmm0 movups (%edi),%xmm1 addps %xmm1,%xmm0 movups %xmm0,(%eax) I think another little and less easy to solve problem comes from this control near the beginning of that _arraySliceSliceAddSliceAssig function: if (sse() && ... That is probably quick, but if you have to sum just 4 floats into a loop I presume it may slow down the code some (there's also the function call, it's not inlined). The availability of sse() can't be done a compile time because you don't know where the code will run (but eventually a compiler argument can be added to specify the program will be run only on CPUs with SSE2, etc). A brutal solution is to duplicate the object code of the functions that contain SSE instructions, so at the beginning of the runtime you can change the jump pointers of the function calls in the whole programs once :-) It may make the executable longer, but seeing how execs are often 300+KB I don't think that's a big problem, and I presume only few functions will contain array operations. Bye, bearophile
Nov 03 2008