digitalmars.D.learn - is D so slow?
- baleog (28/28) Jun 14 2008 Hello
- Unknown W. Brackets (6/36) Jun 14 2008 This is a frequently asked question.
- baleog (4/43) Jun 14 2008 but gdc uses gcc backend and the test programs (C and D) differs by the ...
- Unknown W. Brackets (11/59) Jun 14 2008 What about switches? Your program uses arrays; if you have array bounds...
- Jerry Quinn (4/15) Jun 15 2008 There's another classic benchmark issue that you could be stumbling over...
- Tomas Lindquist Olsen (3/33) Jun 15 2008 What switches did you use to compile? Not much info you're giving ...
- baleog (11/14) Jun 15 2008 Ubuntu-6.06
- Jarrett Billingsley (9/20) Jun 15 2008 Array bounds checking is off as long as you specify -release.
- Dave (71/94) Jun 15 2008 I agree, but nonetheless the malloc version runs much faster on my syste...
- Jarrett Billingsley (24/33) Jun 16 2008 I'm sorry, but using your code, I can't reproduce times anywhere near th...
- baleog (5/6) Jun 16 2008 Maybe it depends on hardware? And `new` effictiveness depends on used ha...
- bearophile (4/5) Jun 15 2008 With DMD you can disable array bound checking using the -release compila...
- baleog (3/38) Jun 15 2008 Thank you for your replies! I used malloc instead of new and run time wa...
- bearophile (16/20) Jun 15 2008 With a smarter use of gc.malloc you may avoid clearing items two times.....
- Fawzi Mohamed (19/21) Jun 16 2008 But you probably did not understand why... and it seems that neither
- Fawzi Mohamed (5/22) Jun 16 2008 ehm, sorry...
- Fawzi Mohamed (11/36) Jun 16 2008 I tested... and well I was actually right (I should have trusted my gut
- baleog (2/9) Jun 16 2008 but if i need evident init loop(replace constant to random initializatio...
- Fawzi Mohamed (21/38) Jun 16 2008 your loop
- Dave (1/11) Jun 16 2008 Good catch...
- Fawzi Mohamed (2/15) Jun 17 2008 thanks :)
- Robert Fraser (2/5) Jun 16 2008 If I remember right, malloc does no initialization; calloc initializes t...
- Fawzi Mohamed (11/17) Jun 17 2008 Indeed calloc is documented to always initialize to 0.
- Saaa (3/3) Jun 15 2008 baleog are you Marco? (same ip)
- Jarrett Billingsley (4/7) Jun 15 2008 They have the same IP because they both used the web interface. You'll
- baleog (3/8) Jun 16 2008 HP Compaq nx6110. Ubuntu Linux 6.06
- Saaa (3/3) Jun 16 2008 Ok,
Hello I wrote 2 almost identical test programs(matrix multiplication). One on C and another on D. And D prorgram was 15 times slower! Was it my mistake or not? Thank you p.s. code: void test (int n) { float[] xs = new float[n*n]; float[] ys = new float[n*n]; for(int i = n-1; i>=0; --i) { xs[i] = 1.0; } for(int i = n-1; i>=0; --i) { ys[i] = 2.0; } float[] zs = new float[n*n]; for (int i=0; i<n; ++i) { for (int j=0; j<n; ++j) { float s = 0.0; for (int k=0; k<n; ++k) { s = s + (xs[k + (i*n)] * ys[j + (k*n)]); } zs[j+ (i*n)] = s; } } delete xs; delete ys; delete zs; }
Jun 14 2008
This is a frequently asked question. What compilers are you comparing? They have different optimization backends. It could be that the C compiler or compiler flags you are using simply perform better than the comparable D compiler and flags. -[Unknown] baleog wrote:Hello I wrote 2 almost identical test programs(matrix multiplication). One on C and another on D. And D prorgram was 15 times slower! Was it my mistake or not? Thank you p.s. code: void test (int n) { float[] xs = new float[n*n]; float[] ys = new float[n*n]; for(int i = n-1; i>=0; --i) { xs[i] = 1.0; } for(int i = n-1; i>=0; --i) { ys[i] = 2.0; } float[] zs = new float[n*n]; for (int i=0; i<n; ++i) { for (int j=0; j<n; ++j) { float s = 0.0; for (int k=0; k<n; ++k) { s = s + (xs[k + (i*n)] * ys[j + (k*n)]); } zs[j+ (i*n)] = s; } } delete xs; delete ys; delete zs; }
Jun 14 2008
Unknown W. Brackets Wrote:What compilers are you comparing?gcc-4.0 with last releases of the gdc and dmd-2They have different optimizationbut gdc uses gcc backend and the test programs (C and D) differs by the couple of bytes.. i used std.gc.disable() too - nothing changedbackends. It could be that the C compiler or compiler flags you are using simply perform better than the comparable D compiler and flags. -[Unknown] baleog wrote:Hello I wrote 2 almost identical test programs(matrix multiplication). One on C and another on D. And D prorgram was 15 times slower! Was it my mistake or not? Thank you p.s. code: void test (int n) { float[] xs = new float[n*n]; float[] ys = new float[n*n]; for(int i = n-1; i>=0; --i) { xs[i] = 1.0; } for(int i = n-1; i>=0; --i) { ys[i] = 2.0; } float[] zs = new float[n*n]; for (int i=0; i<n; ++i) { for (int j=0; j<n; ++j) { float s = 0.0; for (int k=0; k<n; ++k) { s = s + (xs[k + (i*n)] * ys[j + (k*n)]); } zs[j+ (i*n)] = s; } } delete xs; delete ys; delete zs; }
Jun 14 2008
What about switches? Your program uses arrays; if you have array bounds checks enabled, that could easily account for the difference. One way to see is dump the assembly (I think there's a utility called dumpobj included with dmd) and compare. Obviously, it's doing something differently - there's nothing instrinsically "slower" about the language for sure. Also - keep in mind that gdc doesn't take advantage of all the optimizations that gcc is able to provide, at least at this time. A couple of bytes can go a long long way if not optimized right. -[Unknown] baleog wrote:Unknown W. Brackets Wrote:What compilers are you comparing?gcc-4.0 with last releases of the gdc and dmd-2They have different optimizationbut gdc uses gcc backend and the test programs (C and D) differs by the couple of bytes.. i used std.gc.disable() too - nothing changedbackends. It could be that the C compiler or compiler flags you are using simply perform better than the comparable D compiler and flags. -[Unknown] baleog wrote:Hello I wrote 2 almost identical test programs(matrix multiplication). One on C and another on D. And D prorgram was 15 times slower! Was it my mistake or not? Thank you p.s. code: void test (int n) { float[] xs = new float[n*n]; float[] ys = new float[n*n]; for(int i = n-1; i>=0; --i) { xs[i] = 1.0; } for(int i = n-1; i>=0; --i) { ys[i] = 2.0; } float[] zs = new float[n*n]; for (int i=0; i<n; ++i) { for (int j=0; j<n; ++j) { float s = 0.0; for (int k=0; k<n; ++k) { s = s + (xs[k + (i*n)] * ys[j + (k*n)]); } zs[j+ (i*n)] = s; } } delete xs; delete ys; delete zs; }
Jun 14 2008
Unknown W. Brackets Wrote:What about switches? Your program uses arrays; if you have array bounds checks enabled, that could easily account for the difference. One way to see is dump the assembly (I think there's a utility called dumpobj included with dmd) and compare. Obviously, it's doing something differently - there's nothing instrinsically "slower" about the language for sure. Also - keep in mind that gdc doesn't take advantage of all the optimizations that gcc is able to provide, at least at this time. A couple of bytes can go a long long way if not optimized right.There's another classic benchmark issue that you could be stumbling over. The sample code you posted throws away the results inside the function. GCC C can detect that the result of the computations are not used, and optimize everything out of existence. That kind of difference could easily explain the speed difference you're seeing. If you're going to do this kind of micro-benchmark, you need to print the result of computation or otherwise convince the compiler you need the result.
Jun 15 2008
baleog wrote:Hello I wrote 2 almost identical test programs(matrix multiplication). One on C and another on D. And D prorgram was 15 times slower! Was it my mistake or not? Thank you p.s. code: void test (int n) { float[] xs = new float[n*n]; float[] ys = new float[n*n]; for(int i = n-1; i>=0; --i) { xs[i] = 1.0; } for(int i = n-1; i>=0; --i) { ys[i] = 2.0; } float[] zs = new float[n*n]; for (int i=0; i<n; ++i) { for (int j=0; j<n; ++j) { float s = 0.0; for (int k=0; k<n; ++k) { s = s + (xs[k + (i*n)] * ys[j + (k*n)]); } zs[j+ (i*n)] = s; } } delete xs; delete ys; delete zs; }What switches did you use to compile? Not much info you're giving ... Tomas
Jun 15 2008
Tomas Lindquist Olsen Wrote:What switches did you use to compile? Not much info you're giving ...Ubuntu-6.06 dmd-2.0.14 - 40sec witth n=500 dmd -O -release -inline test.d gdc-0.24 - 32sec gdmd -O -release test.d and gcc-4.0.3 - 1.5sec gcc test.c so gcc without optimization runs 20 times faster than gdc but i can't find how to suppress array bound checkingTomas
Jun 15 2008
"baleog" <maccarka yahoo.com> wrote in message news:g32umu$11kq$1 digitalmars.com...Tomas Lindquist Olsen Wrote:Array bounds checking is off as long as you specify -release. I don't know if your computer is just really, REALLY slow, but out of curiosity I tried running the D program on my computer. It completes in 1.2 seconds. Also, using malloc/free vs. new/delete shouldn't much matter in this program, because you make all of three allocations, all before any loops. The GC is never going to be called during the program.What switches did you use to compile? Not much info you're giving ...Ubuntu-6.06 dmd-2.0.14 - 40sec witth n=500 dmd -O -release -inline test.d gdc-0.24 - 32sec gdmd -O -release test.d and gcc-4.0.3 - 1.5sec gcc test.c so gcc without optimization runs 20 times faster than gdc but i can't find how to suppress array bound checking
Jun 15 2008
"Jarrett Billingsley" <kb3ctd2 yahoo.com> wrote in message news:g336hl$10c8$1 digitalmars.com..."baleog" <maccarka yahoo.com> wrote in message news:g32umu$11kq$1 digitalmars.com...I agree, but nonetheless the malloc version runs much faster on my systems (both Linux/Windows, P4 and Core2, all compiled w/ -O -inline -release). The relative performance difference gets larger as n increases: n malloc GC 100 0.094 0.328 200 0.140 1.859 300 0.203 6.094 400 0.312 14.141 500 0.547 27.625 import std.conv; void main(string[] args) { if(args.length > 1) test(toInt(args[1])); else printf("usage: mm nnn\n"); } version(malloc) { import std.c.stdlib; } void test(int n) { version(malloc) { float* xs = cast(float*)malloc(n*n*float.sizeof); float* ys = cast(float*)malloc(n*n*float.sizeof); } else { float[] xs = new float[n*n]; float[] ys = new float[n*n]; } for(int i = n-1; i>=0; --i) { xs[i] = 1.0; } for(int i = n-1; i>=0; --i) { ys[i] = 2.0; } version(malloc) { float* zs = cast(float*)malloc(n*n*float.sizeof); } else { float[] zs = new float[n*n]; } for (int i=0; i<n; ++i) { for (int j=0; j<n; ++j) { float s = 0.0; for (int k=0; k<n; ++k) { s = s + (xs[k + (i*n)] * ys[j + (k*n)]); } zs[j+ (i*n)] = s; } } version(malloc) { free(zs); free(ys); free(xs); } else { delete xs; delete ys; delete zs; } }Tomas Lindquist Olsen Wrote:Array bounds checking is off as long as you specify -release. I don't know if your computer is just really, REALLY slow, but out of curiosity I tried running the D program on my computer. It completes in 1.2 seconds. Also, using malloc/free vs. new/delete shouldn't much matter in this program, because you make all of three allocations, all before any loops. The GC is never going to be called during the program.What switches did you use to compile? Not much info you're giving ...Ubuntu-6.06 dmd-2.0.14 - 40sec witth n=500 dmd -O -release -inline test.d gdc-0.24 - 32sec gdmd -O -release test.d and gcc-4.0.3 - 1.5sec gcc test.c so gcc without optimization runs 20 times faster than gdc but i can't find how to suppress array bound checking
Jun 15 2008
"Dave" <Dave_member pathlink.com> wrote in message news:g34sja$2m1a$1 digitalmars.com...I agree, but nonetheless the malloc version runs much faster on my systems (both Linux/Windows, P4 and Core2, all compiled w/ -O -inline -release). The relative performance difference gets larger as n increases: n malloc GC 100 0.094 0.328 200 0.140 1.859 300 0.203 6.094 400 0.312 14.141 500 0.547 27.625I'm sorry, but using your code, I can't reproduce times anywhere near that. I'm on Windows, DMD, Athlon X2 64. Here are my results: Phobos: n malloc GC ------------------------ 100 0.005206 0.005285 200 0.045083 0.045199 300 0.148954 0.148920 400 0.400136 0.404554 500 0.933754 1.076060 Tango: n malloc GC ------------------------ 100 0.005221 0.005298 200 0.045342 0.044910 300 0.150753 0.149157 400 0.402951 0.403343 500 0.946041 1.073466 Tested with both Tango and Phobos to be sure, and the times are not really any different between the two. The malloc and GC times don't really differ until n=500, and even then it's not by much.
Jun 16 2008
Jarrett Billingsley Wrote:I'm sorry, but using your code, I can't reproduce times anywhere near that.Maybe it depends on hardware? And `new` effictiveness depends on used hardware. my /proc/cpuinfo: intel celeron 1.5GHz flags: fpu, vme, de, tsk, msr, pae, mce, cx8, apic, sep, mtrr, pge, mca, cmov, pat, clflush, dts, acpi, mmx, fxsr, sse, sse2, ss, tm, pbe, nx
Jun 16 2008
baleog:i can't find how to suppress array bound checkingWith DMD you can disable array bound checking using the -release compilation option. Bye, bearophile
Jun 15 2008
Thank you for your replies! I used malloc instead of new and run time was about 1sec p.s. i'm sorry for my terrible english Tomas Lindquist Olsen Wrote:baleog wrote:Hello I wrote 2 almost identical test programs(matrix multiplication). One on C and another on D. And D prorgram was 15 times slower! Was it my mistake or not? Thank you p.s. code: void test (int n) { float[] xs = new float[n*n]; float[] ys = new float[n*n]; for(int i = n-1; i>=0; --i) { xs[i] = 1.0; } for(int i = n-1; i>=0; --i) { ys[i] = 2.0; } float[] zs = new float[n*n]; for (int i=0; i<n; ++i) { for (int j=0; j<n; ++j) { float s = 0.0; for (int k=0; k<n; ++k) { s = s + (xs[k + (i*n)] * ys[j + (k*n)]); } zs[j+ (i*n)] = s; } } delete xs; delete ys; delete zs; }What switches did you use to compile? Not much info you're giving ... Tomas
Jun 15 2008
baleog: I suggest you to show the complete C and the complete D code.float[] xs = new float[n*n];With a smarter use of gc.malloc you may avoid clearing items two times... (I presume the optimizer doesn't remove the first cleaning). You can use this from the d.extra module of my libs: import std.gc: gcmalloc = malloc, gcrealloc = realloc, hasNoPointers; T[] NewVoidGCArray(T)(int n) { assert(n > 0, "NewVoidCGArray: n must be > 0."); auto pt = cast(T*)gcmalloc(n * T.sizeof); hasNoPointers(pt); return pt[0 .. n]; }for(int i = n-1; i>=0; --i) { xs[i] = 1.0; }D arrays know this shorter and probably faster syntax: xs[] = 1.0; Bye, bearophile
Jun 15 2008
On 2008-06-15 13:53:30 +0200, baleog <maccarka yahoo.com> said:Thank you for your replies! I used malloc instead of new and run time was about 1secBut you probably did not understand why... and it seems that neither did others around here... Indeed it is a subtle pitfall in which it is easy to fall. When you benchmark 1) print something depending on the result like the sum of everything (it is not the main issue in this case, but doing it would have probably shown the problem), so you can also have at least a tiny chance to notice if your algorithm is wrong 2) NaNs operations involving NaNs depending on the IEEE compliance requested on the processor can be 1000 times slower!!!!!!!! D (very thoughtfully, as it makes spotting errors easier) initializes the floating point numbers with NaNs (unlike C). -> your results follow if you use malloc, the memory is not initialized with NaNs -> performance manual malloc in this case is definitely not requested writing a benchmark can be subtle... benchmarking correct code is easier... Fawzi
Jun 16 2008
On 2008-06-16 16:32:56 +0200, Fawzi Mohamed <fmohamed mac.com> said:On 2008-06-15 13:53:30 +0200, baleog <maccarka yahoo.com> said:ehm, sorry... You do initialize everything... ehm, never post without testing... FawziThank you for your replies! I used malloc instead of new and run time was about 1secBut you probably did not understand why... and it seems that neither did others around here... Indeed it is a subtle pitfall in which it is easy to fall. When you benchmark 1) print something depending on the result like the sum of everything (it is not the main issue in this case, but doing it would have probably shown the problem), so you can also have at least a tiny chance to notice if your algorithm is wrong 2) NaNs
Jun 16 2008
On 2008-06-16 16:40:16 +0200, Fawzi Mohamed <fmohamed mac.com> said:On 2008-06-16 16:32:56 +0200, Fawzi Mohamed <fmohamed mac.com> said:I tested... and well I was actually right (I should have trusted my gut feeling a little more...) NaN is the culprit. check your algorithm (you initialize, backwards for some strange reason) just part of the arrays... putting xs[] = 1.0; ys[] = 2.0; instead of your strange loops, solves everything... FawziOn 2008-06-15 13:53:30 +0200, baleog <maccarka yahoo.com> said:ehm, sorry... You do initialize everything... ehm, never post without testing... FawziThank you for your replies! I used malloc instead of new and run time was about 1secBut you probably did not understand why... and it seems that neither did others around here... Indeed it is a subtle pitfall in which it is easy to fall. When you benchmark 1) print something depending on the result like the sum of everything (it is not the main issue in this case, but doing it would have probably shown the problem), so you can also have at least a tiny chance to notice if your algorithm is wrong 2) NaNs
Jun 16 2008
Fawzi Mohamed Wrote:check your algorithm (you initialize, backwards for some strange reason) just part of the arrays... putting xs[] = 1.0; ys[] = 2.0; instead of your strange loops, solves everything...but if i need evident init loop(replace constant to random initialization)?? did you mean that in this case i must use `mallloc` function
Jun 16 2008
On 2008-06-16 18:53:48 +0200, baleog <maccarka yahoo.com> said:Fawzi Mohamed Wrote:To quote myself:check your algorithm (you initialize, backwards for some strange reason) just part of the arrays... putting xs[] = 1.0; ys[] = 2.0; instead of your strange loops, solves everything...but if i need evident init loop(replace constant to random initialization)?? did you mean that in this case i must use `mallloc` function2) NaNs operations involving NaNs depending on the IEEE compliance requested on the processor can be 1000 times slower!!!!!!!! D (very thoughtfully, as it makes spotting errors easier) initializes the floating point numbers with NaNs (unlike C).your loop for(int i = n-1; i>=0; --i) { xs[i] = 1.0; } initializes only xs[0..n] but you have also xs[n..n*n] that have their default initial value (the same is valid for ys). In D by default this value is NaN (which is good, as it helps you to spot errors in code that you really use, not only benchmark). When you use these values your program goes very slow if full IEEE compliance is requested from your processor (at least on my pc). If you use malloc, the default initialization does not take place, the memory is normally either initialized to 0, or left uninitialized (with values that likely are not NaN). So your program is fast with malloc, but in fact all this is due to a bug in the program that you are benchmarking, and using malloc is not the correct solution, the solution is to initialize all the values that you use. Fawzi
Jun 16 2008
If you use malloc, the default initialization does not take place, the memory is normally either initialized to 0, or left uninitialized (with values that likely are not NaN). So your program is fast with malloc, but in fact all this is due to a bug in the program that you are benchmarking, and using malloc is not the correct solution, the solution is to initialize all the values that you use. FawziGood catch...
Jun 16 2008
On 2008-06-17 03:23:54 +0200, "Dave" <Dave_member pathlink.com> said:thanks :)If you use malloc, the default initialization does not take place, the memory is normally either initialized to 0, or left uninitialized (with values that likely are not NaN). So your program is fast with malloc, but in fact all this is due to a bug in the program that you are benchmarking, and using malloc is not the correct solution, the solution is to initialize all the values that you use. FawziGood catch...
Jun 17 2008
Fawzi Mohamed wrote:If you use malloc, the default initialization does not take place, the memory is normally either initialized to 0, or left uninitialized (with values that likely are not NaN).If I remember right, malloc does no initialization; calloc initializes to 0.
Jun 16 2008
On 2008-06-17 04:13:14 +0200, Robert Fraser <fraserofthenight gmail.com> said:Fawzi Mohamed wrote:Indeed calloc is documented to always initialize to 0. I think that by default when reusing memory malloc does not initialize it (but normally you can set environment variables to change this behaviour). When getting the memory from the system initialization might (and often will) take place so that a program cannot "sniff" the memory of other programs. The thing is system dependent, malloc gives no guarantee with respect to any special behavior. FawziIf you use malloc, the default initialization does not take place, the memory is normally either initialized to 0, or left uninitialized (with values that likely are not NaN).If I remember right, malloc does no initialization; calloc initializes to 0.
Jun 17 2008
baleog are you Marco? (same ip) What kind of hardware do you have? Because Marco also had some strange speed problems I couldn't replicate.
Jun 15 2008
"Saaa" <empty needmail.com> wrote in message news:g340sc$d1j$1 digitalmars.com...baleog are you Marco? (same ip) What kind of hardware do you have? Because Marco also had some strange speed problems I couldn't replicate.They have the same IP because they both used the web interface. You'll notice that everyone who uses the web interface has the same IP.
Jun 15 2008
Saaa Wrote:baleog are you Marco? (same ip)NoWhat kind of hardware do you have?HP Compaq nx6110. Ubuntu Linux 6.06Because Marco also had some strange speed problems I couldn't replicate.
Jun 16 2008
Ok, It just sounded like the same problem. Fawzi seems to have the solution :)
Jun 16 2008