D.gnu - D vs C code generation
- wscott wscott1.homeip.net (Wayne Scott) (113/113) Jul 17 2004 I was doing some tests using the gdc compiler and comparing it to gcc.
- Stephen Waits (5/6) Jul 19 2004 Interesting ideed. Can you please say which versions of dmd and gcc you...
- wscott wscott1.homeip.net (Wayne Scott) (36/42) Jul 20 2004 Ahh yes, I did leave out that information.
- Stephen Waits (5/7) Jul 20 2004 Probably wouldn't hurt to cross post it over there. It does involve DMD...
I was doing some tests using the gdc compiler and comparing it to gcc. First I created C version of the example wc program: #include <stdlib.h> #include <stdio.h> #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> #include <fcntl.h> char * readfile(char *file) { char *ret; int fd; struct stat sb; stat(file, &sb); ret = malloc(sb.st_size + 1); fd = open(file, O_RDONLY); read(fd, ret, sb.st_size); ret[sb.st_size] = 0; close(fd); return (ret); } int main (int ac, char **av) { int w_total = 0; int l_total = 0; int c_total = 0; int i; printf (" lines words bytes file\n"); for (i = 1; i < ac; i++) { char *input; int w_cnt = 0, l_cnt = 0, c_cnt = 0; int inword = 0; char *p; input = readfile(av[i]); p = input; while (*p) { if (*p == '\n') ++l_cnt; if (*p != ' ') { if (!inword) { inword = 1; ++w_cnt; } } else { inword = 0; } ++c_cnt; ++p; } free(input); printf ("%8u%8u%8u %s\n", l_cnt, w_cnt, c_cnt, av[i]); l_total += l_cnt; w_total += w_cnt; c_total += c_cnt; } if (ac > 2) { printf ("--------------------------------------\n" "%8u%8u%8u total\n", l_total, w_total, c_total); } return (0); } Then I compiled both versions with -O2 and used cachegrind to find out exactly how many instruction each one needed to run. Here are the results with the C version first. (This is over 2 megs of C source) $ valgrind --tool=cachegrind ./wc_c ~/bk/bk-3.3.x/src/*.c > /dev/null ==3349== I refs: 22,529,481 ==3349== I1 misses: 784 ==3349== L2i misses: 778 ==3349== I1 miss rate: 0.0% ==3349== L2i miss rate: 0.0% ==3349== ==3349== D refs: 2,393,366 (2,262,770 rd + 130,596 wr) ==3349== D1 misses: 10,159 ( 9,671 rd + 488 wr) ==3349== L2d misses: 9,680 ( 9,315 rd + 365 wr) ==3349== D1 miss rate: 0.4% ( 0.4% + 0.3% ) ==3349== L2d miss rate: 0.4% ( 0.4% + 0.2% ) ==3349== ==3349== L2 refs: 10,943 ( 10,455 rd + 488 wr) ==3349== L2 misses: 10,458 ( 10,093 rd + 365 wr) ==3349== L2 miss rate: 0.0% ( 0.0% + 0.2% ) farm Dlang $ valgrind --tool=cachegrind ./wc_d ~/bk/bk-3.3.x/src/*.c > /dev/null ==3351== Cachegrind, an I1/D1/L2 cache profiler for x86-linux. ==3351== I refs: 29,081,497 ==3351== I1 misses: 1,216 ==3351== L2i misses: 1,199 ==3351== I1 miss rate: 0.0% ==3351== L2i miss rate: 0.0% ==3351== ==3351== D refs: 4,891,118 (3,663,754 rd + 1,227,364 wr) ==3351== D1 misses: 61,871 ( 24,677 rd + 37,194 wr) ==3351== L2d misses: 60,880 ( 23,757 rd + 37,123 wr) ==3351== D1 miss rate: 1.2% ( 0.6% + 3.0% ) ==3351== L2d miss rate: 1.2% ( 0.6% + 3.0% ) ==3351== ==3351== L2 refs: 63,087 ( 25,893 rd + 37,194 wr) ==3351== L2 misses: 62,079 ( 24,956 rd + 37,123 wr) ==3351== L2 miss rate: 0.1% ( 0.0% + 3.0% ) As you can see the D version of the code used 30% more instructions and 100% more data accesses. (BTW the system wc program was a lot slower than both of these...) That is not too bad for the benefits, but I was hoping they would be closer. Originally I was seeing MUCH different results, but I was using smaller input sets. D has a much higer startup overhead. Next I tried making the D code look like my C version without the dynamic arrays and just using pointers. It didn't really change the numbers at all. Also adding -fno-bounds-check didn't help. That is a good sign because it means that the array code generates the same code you would write using pointer. Anyway I thought the result was interesting... -Wayne
Jul 17 2004
Wayne Scott wrote:Anyway I thought the result was interesting...Interesting ideed. Can you please say which versions of dmd and gcc you used? Thanks, Steve
Jul 19 2004
In article <cdhcbc$no3$2 digitaldaemon.com>, Stephen Waits <steve waits.net> wrote:Wayne Scott wrote:Ahh yes, I did leave out that information. I used release 1f of the D gcc frontend from here: http://home.earthlink.net/~dvdfrdmn/d/ build on top of GCC 3.3.4. And compared it to that same gcc. I tried rebuilding with the linux version of the official compiler and I get this result with -O -release: ==22599== Cachegrind, an I1/D1/L2 cache profiler for x86-linux. ==22599== Copyright (C) 2002-2004, and GNU GPL'd, by Nicholas Nethercote. ==22599== Using valgrind-2.1.1, a program supervision framework for x86-linux. ==22599== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward. ==22599== For more details, rerun with: -v ==22599== ==22599== ==22599== I refs: 23,711,446 ==22599== I1 misses: 1,066 ==22599== L2i misses: 1,055 ==22599== I1 miss rate: 0.0% ==22599== L2i miss rate: 0.0% ==22599== ==22599== D refs: 7,230,055 (6,404,429 rd + 825,626 wr) ==22599== D1 misses: 48,292 ( 10,964 rd + 37,328 wr) ==22599== L2d misses: 46,787 ( 9,685 rd + 37,102 wr) ==22599== D1 miss rate: 0.6% ( 0.1% + 4.5% ) ==22599== L2d miss rate: 0.6% ( 0.1% + 4.4% ) ==22599== ==22599== L2 refs: 49,358 ( 12,030 rd + 37,328 wr) ==22599== L2 misses: 47,842 ( 10,740 rd + 37,102 wr) ==22599== L2 miss rate: 0.1% ( 0.0% + 4.4% ) That is similar to the number of instructions in the C version, but over 3X the number of D refs. The number of D1 misses was the same so the extra loads and stores were probably all on the stack. -Wayne PS: Does anyone read this newgroup or should I have posted this stuff to the digitalmars.D newgroup?Anyway I thought the result was interesting...Interesting ideed. Can you please say which versions of dmd and gcc you used? Thanks, Steve
Jul 20 2004
Wayne Scott wrote:PS: Does anyone read this newgroup or should I have posted this stuff to the digitalmars.D newgroup?Probably wouldn't hurt to cross post it over there. It does involve DMD in addition to gcc, so it's on-topic in both groups. You'll definitely get more response in the main group. --Steve
Jul 20 2004