digitalmars.D - DMDScript vs. others
- Bob (39/39) Jan 20 2005 Tested the (in)famous 'sieve' on an older P3:
- Lionello Lunesu (4/4) Jan 20 2005 We should check if dmd is the only one using a real bit-array for that
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (3/6) Jan 20 2005 I think you mean "wbit[]" ;-)
- Bob (4/8) Jan 20 2005 The whole thing was tested using a char[] flag array,
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (11/19) Jan 21 2005 So bit[] is not only buggy (append), but also slower ? :-)
- Dave (113/129) Jan 20 2005 PIII, 800 Mhz, Win2K
- Walter (4/6) Jan 20 2005 For max speed on D apps, use [-O -release]. Without the -release, the ar...
- Bob (4/10) Jan 20 2005 Thanks for your info. With "-release" dmd compiled
- Matthias Becker (3/3) Jan 21 2005 If I try to run sieve.html I get:
- Walter (3/6) Jan 21 2005 It works in other browsers. I wonder what's up with Opera.
- Dave (8/47) Jan 20 2005 Hey Bob,
Tested the (in)famous 'sieve' on an older P3: Results (times in secs): 2.55 vcpp 4.12 djgpp 8.36 dmc 8.83 lccW32 10.02 dmd 751.00 dms 2403.00 js Speed relative to D language: 3.92941 vcpp 2.43204 djgpp 1.19856 dmc 1.13477 lccw32 1.00000 [*] dmd 0.01334 dms 0.00417 js Remarks: - Scripting languages results extrapolated, don't expect me to wait 2400+ seconds. - The old dscript version 1.02 (2002-11-30) was already much faster than JS (it would have scored 981 secs). - VC seems to like 'sieve', its test results with other apps were good but not by such a huge margin. More info: JS (V5.6): 10 iterations, 1899 primes, elapsed time = 2403 dms (V1.03): 10 iterations, 1899 primes, elapsed time = 751 dmd (V0.111) [-O]: 10000 iterations, 1899 primes, elapsed time = 10.024 LccW32 (V3.3) [-O -p6]: 10000 iterations, 1899 primes, elapsed time = 8.833 dmc (V8.41) [-O -6]: 10000 iterations, 1899 primes, elapsed time = 8.362 DJGpp (gcc V3.43) [-O3 -march=pentium3]: 10000 iterations, 1899 primes, elapsed time = 4.12088 VCpp (V13.10.3077) [/O2 /G6]: 10000 iterations, 1899 primes, elapsed time = 2.553
Jan 20 2005
We should check if dmd is the only one using a real bit-array for that 'flags' variable. If so, it suffers from extra shifts/ands. A byte[] should be tested too. L.
Jan 20 2005
Lionello Lunesu wrote:We should check if dmd is the only one using a real bit-array for that 'flags' variable. If so, it suffers from extra shifts/ands. A byte[] should be tested too.I think you mean "wbit[]" ;-) --anders
Jan 20 2005
The whole thing was tested using a char[] flag array, because bit[] arrays are indeed slower - you were right guessing that. In article <csosm0$d8d$1 digitaldaemon.com>, Lionello Lunesu says...We should check if dmd is the only one using a real bit-array for that 'flags' variable. If so, it suffers from extra shifts/ands. A byte[] should be tested too. L.
Jan 20 2005
Bob wrote:So bit[] is not only buggy (append), but also slower ? :-) Of course, it does save memory for large bit arrays... (while small [0|1|2|3] bit arrays are actually larger) Just wondering if it's really worth it, especially since it seems to come at the expense of getting a boolean type. But since "the fundamental data type is the bit"... <sic> and bit[] is touted as a main D feature, I guess it stays. http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/12625:We should check if dmd is the only one using a real bit-array for that 'flags' variable. If so, it suffers from extra shifts/ands. A byte[] should be tested too.The whole thing was tested using a char[] flag array, because bit[] arrays are indeed slower - you were right guessing that.Henceforth, byte/char shall be known as a "wbit" when used as a bool and int/long shall similarly be known as a "dbit" when used as a bool.--anders PS. "char" and "long" are the C types, known as "byte" and "int" in D.
Jan 21 2005
On Thu, 20 Jan 2005 18:12:50 +0000, Bob wrote:Tested the (in)famous 'sieve' on an older P3:<snip>dmd (V0.111) [-O]: 10000 iterations, 1899 primes, elapsed time = 10.024 LccW32 (V3.3) [-O -p6]: 10000 iterations, 1899 primes, elapsed time = 8.833 dmc (V8.41) [-O -6]: 10000 iterations, 1899 primes, elapsed time = 8.362 DJGpp (gcc V3.43) [-O3 -march=pentium3]: 10000 iterations, 1899 primes, elapsed time = 4.12088 VCpp (V13.10.3077) [/O2 /G6]: 10000 iterations, 1899 primes, elapsed time = 2.553PIII, 800 Mhz, Win2K sieve (10000): dmd v0.111 [-O -inline -release]: 1.67 secs. VCpp v13.10.3077 [/O2 /G6]: 1.65 secs. ary (50000): dmd: 2.70 secs. VCpp: 2.72 secs. heapsort (1000000): dmd: 2.37 secs. VCpp: 2.15 secs. DMD is very competitive. Used char[] for sieve.d because the C version did (apples to apples). Also used -release to turn off array bound checking for D. The C code came from: http://dada.perl.it/shootout/ ;--- sieve.d: ;--- import std.string; void main(char[][] args) { int n = args.length > 1 ? atoi(args[1]) : 1; char flags[8192 + 1]; int count; while(n--) { count = 0; flags[2..length] = 1; for(int i = 2; i < flags.length; i++) { if(flags[i]) { // remove all multiples of prime: i for(int j = i + i; j < flags.length; j += i) flags[j] = 0; count++; } } } printf("Count: %d\n", count); } ;--- ;--- ary.d ;--- import std.string; void main(char[][] args) { int n = args.length > 1 ? atoi(args[1]) : 1; int[] x = new int[n]; int[] y = new int[n]; for(int i = 0; i < n; i++) { x[i] = i + 1; } for(int k = 0; k < 1000; k++) { for(int i = n - 1; i >= 0; i--) { y[i] += x[i]; } } printf("%d %d\n",y[0],y[y.length - 1]); } ;--- ;--- heapsort.d ;--- import std.string; void main(char[][] args) { int n = args.length > 1 ? atoi(args[1]) : 1; double[] ary; ary.length = n + 1; for(int i = 1; i <= n; i++) { ary[i] = gen_random(1); } heapsort(n, ary); printf("%.10g\n", ary[n]); } void heapsort(int n, double[] ra) { int i, j; int ir = n; int l = (n >> 1) + 1; double rra; for (;;) { if (l > 1) { rra = ra[--l]; } else { rra = ra[ir]; ra[ir] = ra[1]; if (--ir == 1) { ra[1] = rra; return; } } i = l; j = l << 1; while (j <= ir) { if (j < ir && ra[j] < ra[j+1]) { ++j; } if (rra < ra[j]) { ra[i] = ra[j]; j += (i = j); } else { j = ir + 1; } } ra[i] = rra; } } const int IM = 139968; const int IA = 3877; const int IC = 29573; double gen_random(double max) { static int last = 42; return( max * (last = (last * IA + IC) % IM) / IM ); } ;--- - Dave
Jan 20 2005
"Bob" <Bob_member pathlink.com> wrote in message news:csosb2$cq8$1 digitaldaemon.com...dmd (V0.111) [-O]: 10000 iterations, 1899 primes, elapsed time = 10.024For max speed on D apps, use [-O -release]. Without the -release, the array overflow checking is turned on!
Jan 20 2005
Thanks for your info. With "-release" dmd compiled programs are getting speedwise into the dmc league. Quite good for an alpha version - I am impressed! In article <csp4pk$o45$1 digitaldaemon.com>, Walter says..."Bob" <Bob_member pathlink.com> wrote in message news:csosb2$cq8$1 digitaldaemon.com...dmd (V0.111) [-O]: 10000 iterations, 1899 primes, elapsed time = 10.024For max speed on D apps, use [-O -release]. Without the -release, the array overflow checking is turned on!
Jan 20 2005
If I try to run sieve.html I get: Opera Opera That's all.
Jan 21 2005
"Matthias Becker" <Matthias_member pathlink.com> wrote in message news:csra3d$95f$1 digitaldaemon.com...If I try to run sieve.html I get: Opera Opera That's all.It works in other browsers. I wonder what's up with Opera.
Jan 21 2005
Hey Bob, I basically 'ported' this to C/++ by wrapping the sieve.ds script with main() and type'ing the variables. On a P3, P4 or AMD64 the ratio of VC to DMC is about 1.2:1. I couldn't find a way to make DMC run as slow as posted - are you sure there isn't a typo somewhere? - Dave In article <csosb2$cq8$1 digitaldaemon.com>, Bob says...Tested the (in)famous 'sieve' on an older P3: Results (times in secs): 2.55 vcpp 4.12 djgpp 8.36 dmc 8.83 lccW32 10.02 dmd 751.00 dms 2403.00 js Speed relative to D language: 3.92941 vcpp 2.43204 djgpp 1.19856 dmc 1.13477 lccw32 1.00000 [*] dmd 0.01334 dms 0.00417 js Remarks: - Scripting languages results extrapolated, don't expect me to wait 2400+ seconds. - The old dscript version 1.02 (2002-11-30) was already much faster than JS (it would have scored 981 secs). - VC seems to like 'sieve', its test results with other apps were good but not by such a huge margin. More info: JS (V5.6): 10 iterations, 1899 primes, elapsed time = 2403 dms (V1.03): 10 iterations, 1899 primes, elapsed time = 751 dmd (V0.111) [-O]: 10000 iterations, 1899 primes, elapsed time = 10.024 LccW32 (V3.3) [-O -p6]: 10000 iterations, 1899 primes, elapsed time = 8.833 dmc (V8.41) [-O -6]: 10000 iterations, 1899 primes, elapsed time = 8.362 DJGpp (gcc V3.43) [-O3 -march=pentium3]: 10000 iterations, 1899 primes, elapsed time = 4.12088 VCpp (V13.10.3077) [/O2 /G6]: 10000 iterations, 1899 primes, elapsed time = 2.553
Jan 20 2005
No typo - just global variables are the cause. Speed gets much better if vars are being made local to main(). In this case I can confirm your findings. In case of 'globalisation', however, dmd, dmc and lcc are way behind vcpp. Just djgpp gets in its vicinity. I have posted a bug in dmd to the bugs forum. If you are intersted, you might check out the 'sieve' coding I have used from there. Remark: After checking assembly listings it turnes out that vcpp does full optimization on global vars, which might or might not be desirable. Djgpp has the best compromise of code and speed, still refraining from keeping some global variables in the CPU registers. However, good code comes at a price: Djgpp is compiling way slower than any of the other compilers mentioned, thus making development cycles a real test for patience. In article <csps4d$1ji3$1 digitaldaemon.com>, Dave says...Hey Bob, I basically 'ported' this to C/++ by wrapping the sieve.ds script with main() and type'ing the variables. On a P3, P4 or AMD64 the ratio of VC to DMC is about 1.2:1. I couldn't find a way to make DMC run as slow as posted - are you sure there isn't a typo somewhere? - Dave In article <csosb2$cq8$1 digitaldaemon.com>, Bob says...Tested the (in)famous 'sieve' on an older P3: Results (times in secs): 2.55 vcpp 4.12 djgpp 8.36 dmc 8.83 lccW32 10.02 dmd 751.00 dms 2403.00 js Speed relative to D language: 3.92941 vcpp 2.43204 djgpp 1.19856 dmc 1.13477 lccw32 1.00000 [*] dmd 0.01334 dms 0.00417 js Remarks: - Scripting languages results extrapolated, don't expect me to wait 2400+ seconds. - The old dscript version 1.02 (2002-11-30) was already much faster than JS (it would have scored 981 secs). - VC seems to like 'sieve', its test results with other apps were good but not by such a huge margin. More info: JS (V5.6): 10 iterations, 1899 primes, elapsed time = 2403 dms (V1.03): 10 iterations, 1899 primes, elapsed time = 751 dmd (V0.111) [-O]: 10000 iterations, 1899 primes, elapsed time = 10.024 LccW32 (V3.3) [-O -p6]: 10000 iterations, 1899 primes, elapsed time = 8.833 dmc (V8.41) [-O -6]: 10000 iterations, 1899 primes, elapsed time = 8.362 DJGpp (gcc V3.43) [-O3 -march=pentium3]: 10000 iterations, 1899 primes, elapsed time = 4.12088 VCpp (V13.10.3077) [/O2 /G6]: 10000 iterations, 1899 primes, elapsed time = 2.553
Jan 20 2005
Ok - I made the vars. global, declared the array of chars statically and came up with results much closer to yours: 3.34:1 (dmc:vc). Thanks, - Dave On Fri, 21 Jan 2005 06:37:38 +0000, Bob wrote:No typo - just global variables are the cause. Speed gets much better if vars are being made local to main(). In this case I can confirm your findings. In case of 'globalisation', however, dmd, dmc and lcc are way behind vcpp. Just djgpp gets in its vicinity. I have posted a bug in dmd to the bugs forum. If you are intersted, you might check out the 'sieve' coding I have used from there. Remark: After checking assembly listings it turnes out that vcpp does full optimization on global vars, which might or might not be desirable. Djgpp has the best compromise of code and speed, still refraining from keeping some global variables in the CPU registers. However, good code comes at a price: Djgpp is compiling way slower than any of the other compilers mentioned, thus making development cycles a real test for patience. In article <csps4d$1ji3$1 digitaldaemon.com>, Dave says...Hey Bob, I basically 'ported' this to C/++ by wrapping the sieve.ds script with main() and type'ing the variables. On a P3, P4 or AMD64 the ratio of VC to DMC is about 1.2:1. I couldn't find a way to make DMC run as slow as posted - are you sure there isn't a typo somewhere? - Dave In article <csosb2$cq8$1 digitaldaemon.com>, Bob says...Tested the (in)famous 'sieve' on an older P3: Results (times in secs): 2.55 vcpp 4.12 djgpp 8.36 dmc 8.83 lccW32 10.02 dmd 751.00 dms 2403.00 js Speed relative to D language: 3.92941 vcpp 2.43204 djgpp 1.19856 dmc 1.13477 lccw32 1.00000 [*] dmd 0.01334 dms 0.00417 js Remarks: - Scripting languages results extrapolated, don't expect me to wait 2400+ seconds. - The old dscript version 1.02 (2002-11-30) was already much faster than JS (it would have scored 981 secs). - VC seems to like 'sieve', its test results with other apps were good but not by such a huge margin. More info: JS (V5.6): 10 iterations, 1899 primes, elapsed time = 2403 dms (V1.03): 10 iterations, 1899 primes, elapsed time = 751 dmd (V0.111) [-O]: 10000 iterations, 1899 primes, elapsed time = 10.024 LccW32 (V3.3) [-O -p6]: 10000 iterations, 1899 primes, elapsed time = 8.833 dmc (V8.41) [-O -6]: 10000 iterations, 1899 primes, elapsed time = 8.362 DJGpp (gcc V3.43) [-O3 -march=pentium3]: 10000 iterations, 1899 primes, elapsed time = 4.12088 VCpp (V13.10.3077) [/O2 /G6]: 10000 iterations, 1899 primes, elapsed time = 2.553
Jan 21 2005