digitalmars.D - Optimizing a raytracer
- =?UTF-8?B?IlLDs2JlcnQgTMOhc3psw7MgUMOh?= =?UTF-8?B?bGki?= (23/23) Oct 16 2013 Hello!
- Jacob Carlborg (6/24) Oct 16 2013 I would say use structs. For compiler I would go with LDC or GDC. Both
- finalpatch (4/8) Oct 16 2013 I find it critical to ensure all loops are unrolled in basic
- ponce (6/12) Oct 16 2013 Yes, by all means use struct.
- =?UTF-8?B?IlLDs2JlcnQgTMOhc3psw7MgUMOh?= =?UTF-8?B?bGki?= (22/30) Oct 17 2013 Thank you for the advice!
- bearophile (7/10) Oct 17 2013 Using a double4 could improve the performance of your code, but
- =?UTF-8?B?IlLDs2JlcnQgTMOhc3psw7MgUMOh?= =?UTF-8?B?bGki?= (7/13) Mar 26 2014 I sadly could not get it to work properly, but the performance
- =?UTF-8?B?IlLDs2JlcnQgTMOhc3psw7MgUMOh?= =?UTF-8?B?bGki?= (2/2) Mar 26 2014 Oh, thanks for all of your help. Nice
- Bienlein (3/3) Mar 26 2014 You can also achieve significant speed-ups by doing things in
- =?UTF-8?B?IlLDs2JlcnQgTMOhc3psw7MgUMOh?= =?UTF-8?B?bGki?= (5/5) Mar 26 2014 Thanks! I already do tracing the samples parallel.
Hello! I am writing an unbiased raytrace renderer in D. I have good progress, but I want to make it as fast as possible where I can do it without compromises. I use a struct with three doubles for vector and color calculations and I have operator overloading for them. Many vectors and colors are created during the tracing calculations. I thought, using classes may require too much memory, because they are not destructed on scope end, and maybe speed reduction when GC kicks in. Is my assumptions that in this case struct are more wise? To avoid the constructing many vectors and colors, I thought to use ref arguments, but I also heard that ref functions are not inlined. What would generate the fastest code for a cross-product for example? What compiler and compilations flags should I use to generate the fastest code? My main target is sixty-four bit machines, cross-platform. What optimizations can I assume for various compilers? Are only once used local variables inlined? So it secure to extract local variables only to make the code more easy to understand? Thanks is Advance! Róbert László Páli
Oct 16 2013
On 2013-10-16 14:02, "Róbert László Páli" wrote:Hello! I am writing an unbiased raytrace renderer in D. I have good progress, but I want to make it as fast as possible where I can do it without compromises. I use a struct with three doubles for vector and color calculations and I have operator overloading for them. Many vectors and colors are created during the tracing calculations. I thought, using classes may require too much memory, because they are not destructed on scope end, and maybe speed reduction when GC kicks in. Is my assumptions that in this case struct are more wise? To avoid the constructing many vectors and colors, I thought to use ref arguments, but I also heard that ref functions are not inlined. What would generate the fastest code for a cross-product for example? What compiler and compilations flags should I use to generate the fastest code? My main target is sixty-four bit machines, cross-platform. What optimizations can I assume for various compilers? Are only once used local variables inlined? So it secure to extract local variables only to make the code more easy to understand?I would say use structs. For compiler I would go with LDC or GDC. Both of these are faster for floating point calculations than DMD. You can always benchmark. -- /Jacob Carlborg
Oct 16 2013
I find it critical to ensure all loops are unrolled in basic vector ops (copy/arithmathc/dot etc.) On Wednesday, 16 October 2013 at 12:02:15 UTC, Róbert László Páli wrote:Hello! I am writing an unbiased raytrace renderer in D. I have good progress, but I want to make it as fast as possible where I can do it without compromises.
Oct 16 2013
On Wednesday, 16 October 2013 at 12:02:15 UTC, Róbert László Páli wrote:I thought, using classes may require too much memory, because they are not destructed on scope end, and maybe speed reduction when GC kicks in. Is my assumptions that in this case struct are more wise?Yes, by all means use struct.What would generate the fastest code for a cross-product for example?If you are on x86, SSE 4.1 introduced an instruction called DPPS which performs a dot product. Maybe you can force it into doing a cross-product with clever swizzles and masks.
Oct 16 2013
Jacob CarlborgI would say use structs. For compiler I would go with LDC or GDC. Both of these are faster for floating point calculations than DMD. You can always benchmark.Thank you for the advice! I installed ldc and used ldmd2. Te benchmarks are amazing! :O DMD > compile = 2503 > run = 26210 LDMD > compile = 3953 > run = 8935 These are in milliseconds, benchmarked with time command. Both were compiled with smae Flags: -O -inline -release -noboundscheck finalpatchI find it critical to ensure all loops are unrolled in basic vector ops (copy/arithmathc/dot etc.)In these crucial parts I don't use loops, made these operations by hand. There are simple 3 named doubles. But thanks for the advice. ponceIf you are on x86, SSE 4.1 introduced an instruction called DPPS which performs a dot product. Maybe you can force it into doing a cross-product with clever swizzles and masks.Could you give me a hint, how it could be implemented in D to use that dot product? I am not expirienced with such low-level programming. And would you suggest to try to use SIMD double4 for 3D vectors? It would take some time to change code.
Oct 17 2013
Róbert László Páli:And would you suggest to try to use SIMD double4 for 3D vectors? It would take some time to change code.Using a double4 could improve the performance of your code, but it must be used wisely. (One general tip is to avoid mixing SIMD and serial code. if you want to use SIMD code, then it's often better to keep using SIMD registers even if you have one value). Bye, bearophile
Oct 17 2013
Using a double4 could improve the performance of your code, but it must be used wisely. (One general tip is to avoid mixing SIMD and serial code. if you want to use SIMD code, then it's often better to keep using SIMD registers even if you have one value).I sadly could not get it to work properly, but the performance seems good so far. Teoretichally I only would need to adjust the Vector struct and operations (a small layer of the code, the rest uses only the Vector type and the operations, not the inside of it). In case you are interested: http://palaes.rudanium.org/SubSpace/render.php
Mar 26 2014
Oh, thanks for all of your help. Nice to see, that D guys do really help. :)
Mar 26 2014
You can also achieve significant speed-ups by doing things in parallel, f.ex. see https://groups.google.com/forum/?hl=de#!searchin/golang-nuts/ray$20tracer/golang-nuts/mxYzHQSV3rw/dOA78aeVLgEJ
Mar 26 2014
Thanks! I already do tracing the samples parallel. Strangly I have a core 2 duo and it seems that using 3 threads is the best (slightly better than 2). Aldough this might be accidetal. Maybe the more-complex samples are more equally in separate threds.
Mar 26 2014