digitalmars.D - 3d graphics float "benchmarks"
- redditor (6/6) Mar 24 2009 MiniLight: a minimal global illumination renderer (re)written in Scala, ...
- bearophile (255/258) Mar 24 2009 The D#2 version is translated from C, the D#3 comes mostly from C#.
- bearophile (15/15) Mar 24 2009 Replacing "double" with "float" in all the C/D program the output image...
- bearophile (17/17) Mar 27 2009 I give advice to new Python programmers, they are often people coming fr...
- Moritz Warning (3/15) Mar 24 2009 I'm in the process to port it to D just for fun.
MiniLight: a minimal global illumination renderer (re)written in Scala, OCaml, Python, Ruby, Lua, Flex and C++ : http://www.hxa.name/minilight/ I think it would be interesting to have a D version and compare for the float performance and maybe even lines of code. There is another similar "benchmark" here: http://lucille.atso-net.jp/aobench/ (from reddit) I'm too much of a D noobie myself to attempt this.
Mar 24 2009
redditor:There is another similar "benchmark" here: http://lucille.atso-net.jp/aobench/I may send this D2 version to the original author... WIDTH = 256 HEIGHT = 256 NSUBSAMPLES = 2 NAO_SAMPLES = 8 Timings, seconds: Python: 193.6 Psyco: 49.80 C gcc: 3.98 C llvm-gcc: 3.84 Psyco timings may be improved using classes. I have tried to create a ShedSkin version too. presume the timings aren't far from the llvm-gcc ones). D code compiled with: DMD v1.041 -O -release -inline C code compiled with: gcc: V. 4.3.3-dw2-tdm-1 (GCC) LLVM: gcc version 4.2.1 (Based on Apple Inc. build 5636) (LLVM build) For both: -Wall -O3 -s -fomit-frame-pointer -msse3 -march=core2 Python: ActivePython 2.6.1.1 (r261:67515, Dec 5 2008, 13:58:38) [MSC v.1500 32 bit (Intel)] on win32 Psyco for Python 2.6, V.1.6.0 final 0I'm too much of a D noobie myself to attempt this.training to learn D1, for you. ----------------------- import std.c.stdio: fopen, fprintf, fwrite, fclose, FILE; import std.c.stdlib: rand; import std.math: sqrt, cos, sin, PI; const int RAND_MAX = short.max; const int WIDTH = 256; const int HEIGHT = 256; const int NSUBSAMPLES = 2; const int NAO_SAMPLES = 8; double fpRand() { // if not available return rand() / cast(double)RAND_MAX; } struct Vec3 { double x, y, z; static double dot(ref Vec3 v0, ref Vec3 v1) { return v0.x * v1.x + v0.y * v1.y + v0.z * v1.z; } static void cross(ref Vec3 v0, ref Vec3 v1, out Vec3 c) { c.x = v0.y * v1.z - v0.z * v1.y; c.y = v0.z * v1.x - v0.x * v1.z; c.z = v0.x * v1.y - v0.y * v1.x; } static void normalize(ref Vec3 c) { double length = sqrt(dot(c, c)); if (length > 1e-17) { c.x /= length; c.y /= length; c.z /= length; } } } struct RayIntersection { Vec3 rayPosition, rayDirection; double distance; Vec3 hitPosition, normal; bool isHit; } struct Sphere { Vec3 center; double radius; void intersects(ref RayIntersection isect) { Vec3 rs = Vec3(isect.rayPosition.x - center.x, isect.rayPosition.y - center.y, isect.rayPosition.z - center.z); double B = Vec3.dot(rs, isect.rayDirection); double C = Vec3.dot(rs, rs) - radius * radius; double D = B * B - C; if (D > 0.0) { double t = -B - sqrt(D); if (t > 0.0 && t < isect.distance) { isect.distance = t; isect.isHit = true; isect.hitPosition.x = isect.rayPosition.x + isect.rayDirection.x * t; isect.hitPosition.y = isect.rayPosition.y + isect.rayDirection.y * t; isect.hitPosition.z = isect.rayPosition.z + isect.rayDirection.z * t; isect.normal.x = isect.hitPosition.x - center.x; isect.normal.y = isect.hitPosition.y - center.y; isect.normal.z = isect.hitPosition.z - center.z; Vec3.normalize(isect.normal); } } } } struct Plane { Vec3 position, normal; void intersects(ref RayIntersection isect) { double d = -Vec3.dot(position, normal); double v = Vec3.dot(isect.rayDirection, normal); if (-1e-17 < v && v < 1e-17) return; double t = -(Vec3.dot(isect.rayPosition, normal) + d) / v; if (t > 0.0 && t < isect.distance) { isect.distance = t; isect.isHit = true; isect.hitPosition.x = isect.rayPosition.x + isect.rayDirection.x * t; isect.hitPosition.y = isect.rayPosition.y + isect.rayDirection.y * t; isect.hitPosition.z = isect.rayPosition.z + isect.rayDirection.z * t; isect.normal = normal; } } } Sphere[3] spheres; Plane plane; Vec3[] getOrthoBasis(ref Vec3 normal) { auto orthoBasis = new Vec3[3]; orthoBasis[2] = normal; orthoBasis[1].x = 0.0; orthoBasis[1].y = 0.0; orthoBasis[1].z = 0.0; if (normal.x < 0.6 && normal.x > -0.6) { orthoBasis[1].x = 1.0; } else if (normal.y < 0.6 && normal.y > -0.6) { orthoBasis[1].y = 1.0; } else if (normal.z < 0.6 && normal.z > -0.6) { orthoBasis[1].z = 1.0; } else { orthoBasis[1].x = 1.0; } Vec3.cross(orthoBasis[1], orthoBasis[2], orthoBasis[0]); Vec3.normalize(orthoBasis[0]); Vec3.cross(orthoBasis[2], orthoBasis[0], orthoBasis[1]); Vec3.normalize(orthoBasis[1]); return orthoBasis; } void getAmbientOcclusion(ref RayIntersection isect, out Vec3 ambientOcclusion) { int ntheta = NAO_SAMPLES; int nphi = NAO_SAMPLES; const double eps = 0.0001; RayIntersection occIsect; occIsect.rayPosition.x = isect.hitPosition.x + eps * isect.normal.x; occIsect.rayPosition.y = isect.hitPosition.y + eps * isect.normal.y; occIsect.rayPosition.z = isect.hitPosition.z + eps * isect.normal.z; auto basis = getOrthoBasis(isect.normal); int hitCount; for (int j = 0; j < ntheta; j++) { for (int i = 0; i < nphi; i++) { double theta = sqrt(fpRand()); double phi = 2.0 * PI * fpRand(); double x = cos(phi) * theta; double y = sin(phi) * theta; double z = sqrt(1.0 - theta * theta); occIsect.rayDirection.x = x * basis[0].x + y * basis[1].x + z * basis[2].x; occIsect.rayDirection.y = x * basis[0].y + y * basis[1].y + z * basis[2].y; occIsect.rayDirection.z = x * basis[0].z + y * basis[1].z + z * basis[2].z; occIsect.distance = 1.0e+17; occIsect.isHit = false; spheres[0].intersects(occIsect); spheres[1].intersects(occIsect); spheres[2].intersects(occIsect); plane.intersects(occIsect); if (occIsect.isHit) hitCount++; } } double occlusionRatio = cast(double)(ntheta * nphi - hitCount) / cast(double)(ntheta * nphi); ambientOcclusion.x = occlusionRatio; ambientOcclusion.y = occlusionRatio; ambientOcclusion.z = occlusionRatio; } ubyte clamp(double value) { int i = cast(int)(value * 255.5); if (i > 255) i = 255; else if (i < 0) i = 0; return cast(ubyte)i; } void render(ubyte[] byteImage, int width, int height, int numberOfSubSamples) { auto fimg = new double[width * height * 3]; fimg[] = 0.0; RayIntersection isect; isect.rayPosition.x = 0.0; isect.rayPosition.y = 0.0; isect.rayPosition.z = 0.0; for (int y = 0; y < height; y++) { for (int x = 0; x < width; x++) { for (int v = 0; v < numberOfSubSamples; v++) { for (int u = 0; u < numberOfSubSamples; u++) { isect.rayDirection.x = (x + (u / cast(double)numberOfSubSamples) - (width / 2.0)) / (width / 2.0); isect.rayDirection.y = -(y + (v / cast(double)numberOfSubSamples) - (height / 2.0)) / (height / 2.0); isect.rayDirection.z = -1.0; isect.distance = 1.0e+17; isect.isHit = false; Vec3.normalize(isect.rayDirection); spheres[0].intersects(isect); spheres[1].intersects(isect); spheres[2].intersects(isect); plane.intersects(isect); if (isect.isHit) { Vec3 ambientOcclusion; getAmbientOcclusion(isect, ambientOcclusion); fimg[3 * (y * width + x) + 0] += ambientOcclusion.x; fimg[3 * (y * width + x) + 1] += ambientOcclusion.y; fimg[3 * (y * width + x) + 2] += ambientOcclusion.z; } } } fimg[3 * (y * width + x) + 0] /= cast(double)(numberOfSubSamples * numberOfSubSamples); fimg[3 * (y * width + x) + 1] /= cast(double)(numberOfSubSamples * numberOfSubSamples); fimg[3 * (y * width + x) + 2] /= cast(double)(numberOfSubSamples * numberOfSubSamples); byteImage[3 * (y * width + x) + 0] = clamp(fimg[3 * (y * width + x) + 0]); byteImage[3 * (y * width + x) + 1] = clamp(fimg[3 * (y * width + x) + 1]); byteImage[3 * (y * width + x) + 2] = clamp(fimg[3 * (y * width + x) + 2]); } } } void setupScene() { spheres[0].center.x = -2.0; spheres[0].center.y = 0.0; spheres[0].center.z = -3.5; spheres[0].radius = 0.5; spheres[1].center.x = -0.5; spheres[1].center.y = 0.0; spheres[1].center.z = -3.0; spheres[1].radius = 0.5; spheres[2].center.x = 1.0; spheres[2].center.y = 0.0; spheres[2].center.z = -2.2; spheres[2].radius = 0.5; plane.position.x = 0.0; plane.position.y = -0.5; plane.position.z = 0.0; plane.normal.x = 0.0; plane.normal.y = 1.0; plane.normal.z = 0.0; } void savePPM(char* fname, int w, int h, ubyte* img) { FILE *fp; fp = fopen(fname, "wb"); assert(fp); fprintf(fp, "P6\n"); fprintf(fp, "%d %d\n", w, h); fprintf(fp, "255\n"); fwrite(img, w * h * 3, 1, fp); fclose(fp); } void main() { auto img = new ubyte[WIDTH * HEIGHT * 3]; setupScene(); render(img, WIDTH, HEIGHT, NSUBSAMPLES); savePPM("ao3_d.ppm".ptr, WIDTH, HEIGHT, img.ptr); }
Mar 24 2009
Replacing "double" with "float" in all the C/D program the output image is the same, but the timings change: Timings, seconds: Python: 193.6 Psyco: 49.80 D2: 6.30 D3: 4.22 C2 gcc: 4.06 (float) C gcc: 3.98 C llvm-gcc: 3.84 C2 llvm-gcc: 3.73 (float) D3b: 3.62 (float) Using both cores of my CPU the timings probably become half. Someone here may suggest the changes in the code to use 2/4 cores (on Windows) in that code. Be careful, the GPUAO versions that are supposed to run in 0.01 s with the GPU, may contain a virus. Bye, bearophile
Mar 24 2009
I give advice to new Python programmers, they are often people coming from Java, and they write "Java in Python". One of the many kinds of things they do wrongly is to use classes for everything, and to build deep class trees. The result is code that is too much slow and long. D language isn't much widespread, so many of new D programmers may come from Java, and D1 compiles code quite similar to basic Java. So they may write Java-style D code. I have translated naively some small programs from Java to D1 and the result is often a slow down of the running speed. This shows that D compilers have a lot of optimizations to catch up from HotSpot (I think two important "optimizations" of HotSpot are its efficient garbage collector and the ability to fully or partially inline many virtual methods). One of those tiny 3D benchmarks shows what I mean: http://leonardo-m.livejournal.com/79346.html Few of those timings, seconds: ao_d, float: 3.67 s AO.java, float, naive: 6.81 s ao2_py with Psyco: 16.72 s ao2_d, float, naive: 33.77 s (Note that I have seen a C++ version that uses threads that runs in about 1 second on my PC, that has 2 cores, so it's about two times faster than the D code. The code is not too much different among translations because the purpose of this benchmark is to compare languages and not compare different skills in manually optimizing code). ao_d is a fast D version of mine that uses structs and lot of references and hidden mutation. It's not easy to translate this version to "clean" and bug-free Java code because of the hidden mutations. Despite being fast I usually don't like this style of programming, because all those references and hidden mutations lead to code hard to debug and hard to translate to other languages (and sooner or later I have to translate lot of the programs I write). AO.java is a quite naive Java adaptation of the original Processing version. ao2_d version is a direct D translation of the Java version. It's clean, it's easy to understand, easy to debug and easy enough to translate to other languages. It shows that sometimes D code written in Java-like style can be very slow when run by D. Profiling of ao2_d D code seems to show that part of the slowdown comes from the virtual methods like dot that aren't inlined. This little program also shows why Java is so much widespread: it's easy to write correct and readable programs that aren't that much slower (and probably it's not too much difficult to add threads to this Java program, making it about as fast as my faster D version). So D style guides have to warn new D programmers coming from Java that some of the optimizations they used to rely on, aren't available, so they have to chage their style and code if they want to keep/gain performance. (If someone is willing I'd like to know timings for ao2_py version on a 3+ cores CPU with Psyco, and the D code run with LDC). Bye, bearophile
Mar 27 2009
On Tue, 24 Mar 2009 11:51:31 -0400, redditor wrote:MiniLight: a minimal global illumination renderer (re)written in Scala, OCaml, Python, Ruby, Lua, Flex and C++ : http://www.hxa.name/minilight/ I think it would be interesting to have a D version and compare for the float performance and maybe even lines of code. There is another similar "benchmark" here: http://lucille.atso-net.jp/aobench/ (from reddit) I'm too much of a D noobie myself to attempt this.I'm in the process to port it to D just for fun. Stay tuned. :)
Mar 24 2009