digitalmars.D - DMDScript vs. others

Bob (39/39) Jan 20 2005 Tested the (in)famous 'sieve' on an older P3:

Lionello Lunesu (4/4) Jan 20 2005 We should check if dmd is the only one using a real bit-array for that

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (3/6) Jan 20 2005 I think you mean "wbit[]" ;-)
Bob (4/8) Jan 20 2005 The whole thing was tested using a char[] flag array,

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (11/19) Jan 21 2005 So bit[] is not only buggy (append), but also slower ? :-)

Dave (113/129) Jan 20 2005 PIII, 800 Mhz, Win2K
Walter (4/6) Jan 20 2005 For max speed on D apps, use [-O -release]. Without the -release, the ar...

Bob (4/10) Jan 20 2005 Thanks for your info. With "-release" dmd compiled
Matthias Becker (3/3) Jan 21 2005 If I try to run sieve.html I get:

Walter (3/6) Jan 21 2005 It works in other browsers. I wonder what's up with Opera.

Dave (8/47) Jan 20 2005 Hey Bob,

Bob (19/91) Jan 20 2005 No typo - just global variables are the cause.

Dave (5/112) Jan 21 2005 Ok - I made the vars. global, declared the array of chars statically and

Bob <Bob_member pathlink.com> writes:

Tested the (in)famous 'sieve' on an older P3:


Results (times in secs):

2.55  vcpp
4.12  djgpp
8.36  dmc
8.83  lccW32
10.02  dmd
751.00  dms
2403.00  js



Speed relative to D language:

3.92941  vcpp
2.43204  djgpp
1.19856  dmc
1.13477  lccw32
1.00000  [*] dmd
0.01334  dms
0.00417  js



Remarks:

- Scripting languages results extrapolated,
don't expect me to wait 2400+ seconds.
- The old dscript version 1.02 (2002-11-30) was already
much faster than JS (it would have scored 981 secs).
- VC seems to like 'sieve', its test results with
other apps were good but not by such a huge margin.



More info:

JS (V5.6):
10 iterations, 1899 primes, elapsed time = 2403

dms (V1.03):
10 iterations, 1899 primes, elapsed time = 751

dmd (V0.111) [-O]:
10000 iterations, 1899 primes, elapsed time = 10.024

LccW32 (V3.3) [-O -p6]:
10000 iterations, 1899 primes, elapsed time = 8.833

dmc (V8.41) [-O -6]:
10000 iterations, 1899 primes, elapsed time = 8.362

DJGpp (gcc V3.43) [-O3 -march=pentium3]:
10000 iterations, 1899 primes, elapsed time = 4.12088

VCpp (V13.10.3077) [/O2 /G6]:
10000 iterations, 1899 primes, elapsed time = 2.553

Jan 20 2005

"Lionello Lunesu" <lionello.lunesu crystalinter.remove.com> writes:

We should check if dmd is the only one using a real bit-array for that 
'flags' variable. If so, it suffers from extra shifts/ands. A byte[] should 
be tested too.

L.

Jan 20 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Lionello Lunesu wrote:

 We should check if dmd is the only one using a real bit-array for that 
 'flags' variable. If so, it suffers from extra shifts/ands. A byte[] should 
 be tested too.

I think you mean "wbit[]" ;-)

--anders

Jan 20 2005

Bob <Bob_member pathlink.com> writes:

The whole thing was tested using a char[] flag array,
because bit[] arrays are indeed slower - you were right
guessing that.



In article <csosm0$d8d$1 digitaldaemon.com>, Lionello Lunesu says...
We should check if dmd is the only one using a real bit-array for that 
'flags' variable. If so, it suffers from extra shifts/ands. A byte[] should 
be tested too.

L.

Jan 20 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Bob wrote:

We should check if dmd is the only one using a real bit-array for that 
'flags' variable. If so, it suffers from extra shifts/ands. A byte[] should 
be tested too.

 The whole thing was tested using a char[] flag array,
 because bit[] arrays are indeed slower - you were right
 guessing that.

So bit[] is not only buggy (append), but also slower ? :-)

Of course, it does save memory for large bit arrays...
(while small [0|1|2|3] bit arrays are actually larger)
Just wondering if it's really worth it, especially since
it seems to come at the expense of getting a boolean type.
But since "the fundamental data type is the bit"... <sic>
and bit[] is touted as a main D feature, I guess it stays.

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/12625:
 Henceforth, byte/char shall be known as a "wbit" when used as a bool
 and int/long shall similarly be known as a "dbit" when used as a bool.

--anders

PS. "char" and "long" are the C types, known as "byte" and "int" in D.

Jan 21 2005

Dave <Dave_member pathlink.com> writes:

On Thu, 20 Jan 2005 18:12:50 +0000, Bob wrote:
 
 Tested the (in)famous 'sieve' on an older P3:

<snip>
 dmd (V0.111) [-O]:
 10000 iterations, 1899 primes, elapsed time = 10.024
 
 LccW32 (V3.3) [-O -p6]:
 10000 iterations, 1899 primes, elapsed time = 8.833
 
 dmc (V8.41) [-O -6]:
 10000 iterations, 1899 primes, elapsed time = 8.362
 
 DJGpp (gcc V3.43) [-O3 -march=pentium3]:
 10000 iterations, 1899 primes, elapsed time = 4.12088
 
 VCpp (V13.10.3077) [/O2 /G6]:
 10000 iterations, 1899 primes, elapsed time = 2.553

PIII, 800 Mhz, Win2K

sieve (10000):
dmd v0.111 [-O -inline -release]: 1.67 secs.
VCpp v13.10.3077 [/O2 /G6]: 1.65 secs.

ary (50000):
dmd: 2.70 secs.
VCpp: 2.72 secs.

heapsort (1000000):
dmd: 2.37 secs.
VCpp: 2.15 secs.

DMD is very competitive. Used char[] for sieve.d because the C
version did (apples to apples). Also used -release to turn off array bound
checking for D.

The C code came from: http://dada.perl.it/shootout/

;---
sieve.d:
;---
import std.string;
void main(char[][] args)
{
    int n = args.length > 1 ? atoi(args[1]) : 1;

    char flags[8192 + 1];
    int  count;

    while(n--) {
        count = 0; 
        flags[2..length] = 1;
        for(int i = 2; i < flags.length; i++) {
            if(flags[i]) {
                // remove all multiples of prime: i
                for(int j = i + i; j < flags.length; j += i) flags[j] = 0;
                count++;
            }
        }
    }

    printf("Count: %d\n", count);
}
;---

;---
ary.d
;---
import std.string;
void main(char[][] args) {
    int n = args.length > 1 ? atoi(args[1]) : 1;

    int[] x = new int[n];
    int[] y = new int[n];

    for(int i = 0; i < n; i++) {
        x[i] = i + 1;
    }
    
    for(int k = 0; k < 1000; k++) {
        for(int i = n - 1; i >= 0; i--) {
            y[i] += x[i];
        }
    }

    printf("%d %d\n",y[0],y[y.length - 1]);
}
;---

;---
heapsort.d
;---
import std.string;

void main(char[][] args)
{
    int n = args.length > 1 ? atoi(args[1]) : 1;

    double[] ary;

    ary.length = n + 1;
    for(int i = 1; i <= n; i++) {
       ary[i] = gen_random(1);
    }

    heapsort(n, ary);

    printf("%.10g\n", ary[n]);
}

void heapsort(int n, double[] ra) {
    int i, j;
    int ir = n;
    int l = (n >> 1) + 1;
    double rra;

    for (;;) {
        if (l > 1) {
            rra = ra[--l];
        } else {
            rra = ra[ir];
            ra[ir] = ra[1];
            if (--ir == 1) {
                ra[1] = rra;
                return;
            }
        }
        i = l;
        j = l << 1;
        while (j <= ir) {
            if (j < ir && ra[j] < ra[j+1]) { ++j; }
            if (rra < ra[j]) {
                ra[i] = ra[j];
                j += (i = j);
            } else {
                j = ir + 1;
            }
        }
        ra[i] = rra;
    }
}

const int IM = 139968;
const int IA = 3877;
const int IC = 29573;

double gen_random(double max) {
    static int last = 42;
    return( max * (last = (last * IA + IC) % IM) / IM );
}

;---

- Dave

Jan 20 2005

"Walter" <newshound digitalmars.com> writes:

"Bob" <Bob_member pathlink.com> wrote in message
news:csosb2$cq8$1 digitaldaemon.com...
 dmd (V0.111) [-O]:
 10000 iterations, 1899 primes, elapsed time = 10.024

For max speed on D apps, use [-O -release]. Without the -release, the array
overflow checking is turned on!

Jan 20 2005

Bob <Bob_member pathlink.com> writes:

Thanks for your info. With "-release" dmd compiled
programs are getting speedwise into the dmc league.
Quite good for an alpha version - I am impressed!



In article <csp4pk$o45$1 digitaldaemon.com>, Walter says...
"Bob" <Bob_member pathlink.com> wrote in message
news:csosb2$cq8$1 digitaldaemon.com...
 dmd (V0.111) [-O]:
 10000 iterations, 1899 primes, elapsed time = 10.024

For max speed on D apps, use [-O -release]. Without the -release, the array
overflow checking is turned on!

Jan 20 2005

Matthias Becker <Matthias_member pathlink.com> writes:

If I try to run sieve.html I get: 

Opera Opera

That's all.

Jan 21 2005

"Walter" <newshound digitalmars.com> writes:

"Matthias Becker" <Matthias_member pathlink.com> wrote in message
news:csra3d$95f$1 digitaldaemon.com...
 If I try to run sieve.html I get:

 Opera Opera

 That's all.

It works in other browsers. I wonder what's up with Opera.

Jan 21 2005

Dave <Dave_member pathlink.com> writes:

Hey Bob,

I basically 'ported' this to C/++ by wrapping the sieve.ds script with main()
and type'ing the variables.

On a P3, P4 or AMD64 the ratio of VC to DMC is about 1.2:1. I couldn't find a
way to make DMC run as slow as posted - are you sure there isn't a typo
somewhere?

- Dave

In article <csosb2$cq8$1 digitaldaemon.com>, Bob says...
Tested the (in)famous 'sieve' on an older P3:


Results (times in secs):

2.55  vcpp
4.12  djgpp
8.36  dmc
8.83  lccW32
10.02  dmd
751.00  dms
2403.00  js



Speed relative to D language:

3.92941  vcpp
2.43204  djgpp
1.19856  dmc
1.13477  lccw32
1.00000  [*] dmd
0.01334  dms
0.00417  js



Remarks:

- Scripting languages results extrapolated,
don't expect me to wait 2400+ seconds.
- The old dscript version 1.02 (2002-11-30) was already
much faster than JS (it would have scored 981 secs).
- VC seems to like 'sieve', its test results with
other apps were good but not by such a huge margin.



More info:

JS (V5.6):
10 iterations, 1899 primes, elapsed time = 2403

dms (V1.03):
10 iterations, 1899 primes, elapsed time = 751

dmd (V0.111) [-O]:
10000 iterations, 1899 primes, elapsed time = 10.024

LccW32 (V3.3) [-O -p6]:
10000 iterations, 1899 primes, elapsed time = 8.833

dmc (V8.41) [-O -6]:
10000 iterations, 1899 primes, elapsed time = 8.362

DJGpp (gcc V3.43) [-O3 -march=pentium3]:
10000 iterations, 1899 primes, elapsed time = 4.12088

VCpp (V13.10.3077) [/O2 /G6]:
10000 iterations, 1899 primes, elapsed time = 2.553

Jan 20 2005

Bob <Bob_member pathlink.com> writes:

No typo - just global variables are the cause.

Speed gets much better if vars are being made local
to main(). In this case I can confirm your findings.
In case of 'globalisation', however, dmd, dmc and lcc
are way behind vcpp. Just djgpp gets in its vicinity.

I have posted a bug in dmd to the bugs forum.
If you are intersted, you might check out the
'sieve' coding I have used from there.


Remark:
After checking assembly listings it turnes out that
vcpp does full optimization on global vars, which
might or might not be desirable. Djgpp has the best
compromise of code and speed, still refraining from
keeping some global variables in the CPU registers.
However, good code comes at a price: Djgpp is
compiling way slower than any of the other compilers
mentioned, thus making development cycles a real test
for patience.



In article <csps4d$1ji3$1 digitaldaemon.com>, Dave says...
Hey Bob,

I basically 'ported' this to C/++ by wrapping the sieve.ds script with main()
and type'ing the variables.

On a P3, P4 or AMD64 the ratio of VC to DMC is about 1.2:1. I couldn't find a
way to make DMC run as slow as posted - are you sure there isn't a typo
somewhere?

- Dave

In article <csosb2$cq8$1 digitaldaemon.com>, Bob says...
Tested the (in)famous 'sieve' on an older P3:


Results (times in secs):

2.55  vcpp
4.12  djgpp
8.36  dmc
8.83  lccW32
10.02  dmd
751.00  dms
2403.00  js



Speed relative to D language:

3.92941  vcpp
2.43204  djgpp
1.19856  dmc
1.13477  lccw32
1.00000  [*] dmd
0.01334  dms
0.00417  js



Remarks:

- Scripting languages results extrapolated,
don't expect me to wait 2400+ seconds.
- The old dscript version 1.02 (2002-11-30) was already
much faster than JS (it would have scored 981 secs).
- VC seems to like 'sieve', its test results with
other apps were good but not by such a huge margin.



More info:

JS (V5.6):
10 iterations, 1899 primes, elapsed time = 2403

dms (V1.03):
10 iterations, 1899 primes, elapsed time = 751

dmd (V0.111) [-O]:
10000 iterations, 1899 primes, elapsed time = 10.024

LccW32 (V3.3) [-O -p6]:
10000 iterations, 1899 primes, elapsed time = 8.833

dmc (V8.41) [-O -6]:
10000 iterations, 1899 primes, elapsed time = 8.362

DJGpp (gcc V3.43) [-O3 -march=pentium3]:
10000 iterations, 1899 primes, elapsed time = 4.12088

VCpp (V13.10.3077) [/O2 /G6]:
10000 iterations, 1899 primes, elapsed time = 2.553

Jan 20 2005

Dave <Dave_member pathlink.com> writes:

Ok - I made the vars. global, declared the array of chars statically and
came up with results much closer to yours: 3.34:1 (dmc:vc).

Thanks,

- Dave

On Fri, 21 Jan 2005 06:37:38 +0000, Bob wrote:

 
 No typo - just global variables are the cause.
 
 Speed gets much better if vars are being made local
 to main(). In this case I can confirm your findings.
 In case of 'globalisation', however, dmd, dmc and lcc
 are way behind vcpp. Just djgpp gets in its vicinity.
 
 I have posted a bug in dmd to the bugs forum.
 If you are intersted, you might check out the
 'sieve' coding I have used from there.
 
 
 Remark:
 After checking assembly listings it turnes out that
 vcpp does full optimization on global vars, which
 might or might not be desirable. Djgpp has the best
 compromise of code and speed, still refraining from
 keeping some global variables in the CPU registers.
 However, good code comes at a price: Djgpp is
 compiling way slower than any of the other compilers
 mentioned, thus making development cycles a real test
 for patience.
 
 
 
 In article <csps4d$1ji3$1 digitaldaemon.com>, Dave says...
Hey Bob,

I basically 'ported' this to C/++ by wrapping the sieve.ds script with main()
and type'ing the variables.

On a P3, P4 or AMD64 the ratio of VC to DMC is about 1.2:1. I couldn't find a
way to make DMC run as slow as posted - are you sure there isn't a typo
somewhere?

- Dave

In article <csosb2$cq8$1 digitaldaemon.com>, Bob says...
Tested the (in)famous 'sieve' on an older P3:


Results (times in secs):

2.55  vcpp
4.12  djgpp
8.36  dmc
8.83  lccW32
10.02  dmd
751.00  dms
2403.00  js



Speed relative to D language:

3.92941  vcpp
2.43204  djgpp
1.19856  dmc
1.13477  lccw32
1.00000  [*] dmd
0.01334  dms
0.00417  js



Remarks:

- Scripting languages results extrapolated,
don't expect me to wait 2400+ seconds.
- The old dscript version 1.02 (2002-11-30) was already
much faster than JS (it would have scored 981 secs).
- VC seems to like 'sieve', its test results with
other apps were good but not by such a huge margin.



More info:

JS (V5.6):
10 iterations, 1899 primes, elapsed time = 2403

dms (V1.03):
10 iterations, 1899 primes, elapsed time = 751

dmd (V0.111) [-O]:
10000 iterations, 1899 primes, elapsed time = 10.024

LccW32 (V3.3) [-O -p6]:
10000 iterations, 1899 primes, elapsed time = 8.833

dmc (V8.41) [-O -6]:
10000 iterations, 1899 primes, elapsed time = 8.362

DJGpp (gcc V3.43) [-O3 -march=pentium3]:
10000 iterations, 1899 primes, elapsed time = 4.12088

VCpp (V13.10.3077) [/O2 /G6]:
10000 iterations, 1899 primes, elapsed time = 2.553

Jan 21 2005

D Programming

C/C++ Programming

Other

digitalmars.D - DMDScript vs. others