digitalmars.D.learn - Small part of a program : d and c versions performances diff.
- Larry (92/92) Jul 09 2014 Hello,
- NCrashed (35/127) Jul 09 2014 Clock isn't an accurate benchmark instrument. Try
- bearophile (121/128) Jul 09 2014 Your C code is not equivalent to the D code, there are small
- Larry (13/143) Jul 09 2014 You are definitely right, I did mess up while translating !
- bearophile (8/14) Jul 09 2014 If you run it on very low powered hardware then you may not need
- John Colvin (8/208) Jul 09 2014 Could you provide the exact code you are using for that
- Larry (14/14) Jul 09 2014 Yes you are perfectly right but our need is to run the fastest
- Larry (5/19) Jul 09 2014 @John Colvin :
- bearophile (9/13) Jul 09 2014 Have you benchmarked the D code without starting the current
- Larry (6/6) Jul 09 2014 @Bearophile: just tried. No dramatic change.
- bearophile (10/16) Jul 09 2014 That just means disabling the GC, so the start time is the same.
- John Colvin (6/10) Jul 09 2014 You say you are worried about microseconds and power consumption,
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (2/5) Jul 09 2014 Not much overhead if you don't use a MMU and use static linking.
- Larry (15/25) Jul 09 2014 @John : A new process ? Where ?
- Chris (5/33) Jul 09 2014 I wouldn't give up on D (as you've already signalled). It's
- John Colvin (8/36) Jul 09 2014 process == program in this case. Launching a new process ==
- Kapps (3/17) Jul 09 2014 This to me pretty much confirms that almost the entirety of your
- Larry (10/10) Jul 09 2014 The actual code is not that much slower according to the numerous
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (5/9) Jul 09 2014 Changing the topic a little, the calculation above ignores the tv_sec
- Larry (5/15) Jul 09 2014 Absolutely Ali because I know it is under the sec range. I made
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (6/22) Jul 09 2014 I know it did work and will work every time you test it. :)
- Larry (1/1) Jul 09 2014 Right
- Kapps (19/19) Jul 09 2014 Measure a larger number of loops. I understand you're concerned
Hello, I extracted a part of my code written in c. it is deliberately useless here but I would understand the different technics to optimize such kind of code with gdc compiler. it currently runs under a microsecond. Constraint : the way the code is expressed cannot be changed much we need that double loop because there are other operations involved in the first loop scope. main.c : [code] #include <stdio.h> #include <string.h> #include <stdlib.h> #include "jol.h" #include <time.h> #include <sys/time.h> int main(void) { struct timeval s,e; gettimeofday(&s,NULL); int pol = 5; tes(&pol); int arr[] = {9,16,458,2,68,5452,98,32,4,565,78,985,3215}; int len = 13-1; int g = 0; for (int x = 36; x >= 0 ; --x ){ // some code here erased for the test for(int y = len ; y >= 0; --y){ //some other code here ++g; arr[y] +=1; } } gettimeofday(&e,NULL); printf("so ? %d %lu %d %d %d",g,e.tv_usec - s.tv_usec, arr[4],arr[9],pol); return 0; } [/code] jol.c [code] void tes(int * restrict a){ *a = 9; } [/code] and jol.h #ifndef JOL_H #define JOL_H void tes(int * restrict a); #endif // JOL_H Now, the D counterpart: module main; import std.stdio; import std.datetime; import jol; int main(string[] args) { auto currentTime = Clock.currTime(); int pol = 5; tes(pol); pol = 8; int arr[] = [9,16,458,2,68,5452,98,32,4,565,78,985,3215]; int len = 13-1; int g = 0; for (int x = 31; x >= 0 ; --x ){ for(int y = len ; y >= 0; --y){ ++g; arr[y] +=1; } } auto currentTime2 = Clock.currTime(); writefln("Hello World %d %s %d %d\n",g, (currentTime2 - currentTime),arr[4],arr[9]); return 0; } and module jol; final void tes(ref int a){ a = 9; } Ok, the compilation options : gdc hello.d jol.d -O3 -frelease -ftree-loop-optimize gcc -march=native -std=c11 -O2 main.c jol.c Now the performance : D : 12 µs C : < 1µs Where does the diff comes from ? Is there a way to optimize the d version ? Again, I am absolutely new to D and those are my very first line of code with it. Thanks
Jul 09 2014
On Wednesday, 9 July 2014 at 10:57:33 UTC, Larry wrote:Hello, I extracted a part of my code written in c. it is deliberately useless here but I would understand the different technics to optimize such kind of code with gdc compiler. it currently runs under a microsecond. Constraint : the way the code is expressed cannot be changed much we need that double loop because there are other operations involved in the first loop scope. main.c : [code] #include <stdio.h> #include <string.h> #include <stdlib.h> #include "jol.h" #include <time.h> #include <sys/time.h> int main(void) { struct timeval s,e; gettimeofday(&s,NULL); int pol = 5; tes(&pol); int arr[] = {9,16,458,2,68,5452,98,32,4,565,78,985,3215}; int len = 13-1; int g = 0; for (int x = 36; x >= 0 ; --x ){ // some code here erased for the test for(int y = len ; y >= 0; --y){ //some other code here ++g; arr[y] +=1; } } gettimeofday(&e,NULL); printf("so ? %d %lu %d %d %d",g,e.tv_usec - s.tv_usec, arr[4],arr[9],pol); return 0; } [/code] jol.c [code] void tes(int * restrict a){ *a = 9; } [/code] and jol.h #ifndef JOL_H #define JOL_H void tes(int * restrict a); #endif // JOL_H Now, the D counterpart: module main; import std.stdio; import std.datetime; import jol; int main(string[] args) { auto currentTime = Clock.currTime(); int pol = 5; tes(pol); pol = 8; int arr[] = [9,16,458,2,68,5452,98,32,4,565,78,985,3215]; int len = 13-1; int g = 0; for (int x = 31; x >= 0 ; --x ){ for(int y = len ; y >= 0; --y){ ++g; arr[y] +=1; } } auto currentTime2 = Clock.currTime(); writefln("Hello World %d %s %d %d\n",g, (currentTime2 - currentTime),arr[4],arr[9]); return 0; } and module jol; final void tes(ref int a){ a = 9; } Ok, the compilation options : gdc hello.d jol.d -O3 -frelease -ftree-loop-optimize gcc -march=native -std=c11 -O2 main.c jol.c Now the performance : D : 12 µs C : < 1µs Where does the diff comes from ? Is there a way to optimize the d version ? Again, I am absolutely new to D and those are my very first line of code with it. ThanksClock isn't an accurate benchmark instrument. Try std.datetime.benchmark: ``` module main; import std.stdio; import std.datetime; void tes(ref int a) { a = 9; } int[] arr = [9,16,458,2,68,5452,98,32,4,565,78,985,3215]; void foo() { int pol = 5; tes(pol); pol = 8; int g = 0; foreach_reverse(x; 0..31) { foreach_reverse(ref a; arr) { ++g; a += 1; } } } void main() { auto res = benchmark!foo(1000); // take mean of 1000 launches writeln(res[0].msecs, " ", arr[4], " ", arr[9]); } ``` Dmd time: 1 us Gcc time: <= 1 us
Jul 09 2014
Larry:Now the performance : D : 12 µs C : < 1µs Where does the diff comes from ? Is there a way to optimize the d version ? Again, I am absolutely new to D and those are my very first line of code with it.Your C code is not equivalent to the D code, there are small differences, even the output is different. So I've cleaned up your C and D code: ------------------------ // C code. #include <stdio.h> #include <string.h> #include <stdlib.h> #include <time.h> #include <sys/time.h> #include "jol.h" int main() { struct timeval s, e; gettimeofday(&s, NULL); int pol = 5; tes(&pol); int arr[] = {9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215}; int len = 13 - 1; int g = 0; for (int x = 36; x >= 0; --x) { for (int y = len; y >= 0; --y) { ++g; arr[y]++; } } gettimeofday(&e, NULL); printf("C: %d %lu %d %d %d\n", g, e.tv_usec - s.tv_usec, arr[4], arr[9], pol); return 0; } ------------------------ D code ("final" functions have not much meaning, but the D compiler is very sloppy and doesn't complain): module jol; void tes(ref int a) { a = 9; } --------- module maind; void main() { import std.stdio; import std.datetime; import jol; StopWatch sw; sw.start; int pol = 5; tes(pol); int[] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215]; int len = 13 - 1; int g = 0; for (int x = 36; x >= 0; --x) { // Some code here erased for the test. for (int y = len; y >= 0; --y) { // Some other code here. ++g; arr[y]++; } } sw.stop; writefln("D: %d %d %d %d %d", g, sw.peek.nsecs, arr[4], arr[9], pol); } ---------------- That D code is not fully idiomatic, this is closer to idiomatic D code: module jol2; void test(ref int x) pure nothrow safe { x = 9; } module maind; void main() { import std.stdio, std.datetime; import jol2; StopWatch sw; sw.start; int pol = 5; test(pol); int[13] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215]; uint count = 0; foreach_reverse (immutable _; 0 .. 37) { foreach_reverse (ref ai; arr) { count++; ai++; } } sw.stop; writefln("D: %d %d %d %d %d", count, sw.peek.nsecs, arr[4], arr[9], pol); } ---------------- In my benchmarks I don't have used the more idiomatic D code, I have used the C-like code. But the run-time is essentially the same. I compile the C and D code with (on a 32 bit Windows): gcc -march=native -std=c11 -O2 main.c jol.c -o main ldmd2 -wi -O -release -inline -noboundscheck maind.d jol.d strip maind.exe For the D code I've used the latest ldc2 compiler (V. 0.13.0, based on DMD v2.064 and LLVM 3.4.2), GCC is V.4.8.0 (rubenvb-4.8.0). ---------------- The C code gives as ouput: C: 481 0 105 602 9 The D code gives as output: D: 481 6076 105 602 9 ---------------------- If I slow down the CPU at half speed the C code runs in about 0.05 seconds, the D code runs in about 0.07 seconds. Such run times are too much small to perform a sufficiently meaningful comparison. You need a run-time of about 2 seconds to get meaningful timings. The difference between 0.05 and 0.07 is caused by initializing the D rutime (like the D GC), it takes about 0.015 seconds on my systems at full speed CPU to initialize the D runtime, and it's a constant time. Bye, bearophile
Jul 09 2014
On Wednesday, 9 July 2014 at 12:25:40 UTC, bearophile wrote:Larry:You are definitely right, I did mess up while translating ! I run the corrected codes (the ones I was meant to provide :S) and on a slow macbook I end up with : C : 2 D : 15994 Of course when run on very high end machines, this diff is almost non existent but we want to run on very low powered hardware. Ok, even with a longer code, there will always be a launch penalty for d. So I cannot use it for very high performance loops. Shame for us.. :) Thanks and byeNow the performance : D : 12 µs C : < 1µs Where does the diff comes from ? Is there a way to optimize the d version ? Again, I am absolutely new to D and those are my very first line of code with it.Your C code is not equivalent to the D code, there are small differences, even the output is different. So I've cleaned up your C and D code: ------------------------ // C code. #include <stdio.h> #include <string.h> #include <stdlib.h> #include <time.h> #include <sys/time.h> #include "jol.h" int main() { struct timeval s, e; gettimeofday(&s, NULL); int pol = 5; tes(&pol); int arr[] = {9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215}; int len = 13 - 1; int g = 0; for (int x = 36; x >= 0; --x) { for (int y = len; y >= 0; --y) { ++g; arr[y]++; } } gettimeofday(&e, NULL); printf("C: %d %lu %d %d %d\n", g, e.tv_usec - s.tv_usec, arr[4], arr[9], pol); return 0; } ------------------------ D code ("final" functions have not much meaning, but the D compiler is very sloppy and doesn't complain): module jol; void tes(ref int a) { a = 9; } --------- module maind; void main() { import std.stdio; import std.datetime; import jol; StopWatch sw; sw.start; int pol = 5; tes(pol); int[] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215]; int len = 13 - 1; int g = 0; for (int x = 36; x >= 0; --x) { // Some code here erased for the test. for (int y = len; y >= 0; --y) { // Some other code here. ++g; arr[y]++; } } sw.stop; writefln("D: %d %d %d %d %d", g, sw.peek.nsecs, arr[4], arr[9], pol); } ---------------- That D code is not fully idiomatic, this is closer to idiomatic D code: module jol2; void test(ref int x) pure nothrow safe { x = 9; } module maind; void main() { import std.stdio, std.datetime; import jol2; StopWatch sw; sw.start; int pol = 5; test(pol); int[13] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215]; uint count = 0; foreach_reverse (immutable _; 0 .. 37) { foreach_reverse (ref ai; arr) { count++; ai++; } } sw.stop; writefln("D: %d %d %d %d %d", count, sw.peek.nsecs, arr[4], arr[9], pol); } ---------------- In my benchmarks I don't have used the more idiomatic D code, I have used the C-like code. But the run-time is essentially the same. I compile the C and D code with (on a 32 bit Windows): gcc -march=native -std=c11 -O2 main.c jol.c -o main ldmd2 -wi -O -release -inline -noboundscheck maind.d jol.d strip maind.exe For the D code I've used the latest ldc2 compiler (V. 0.13.0, based on DMD v2.064 and LLVM 3.4.2), GCC is V.4.8.0 (rubenvb-4.8.0). ---------------- The C code gives as ouput: C: 481 0 105 602 9 The D code gives as output: D: 481 6076 105 602 9 ---------------------- If I slow down the CPU at half speed the C code runs in about 0.05 seconds, the D code runs in about 0.07 seconds. Such run times are too much small to perform a sufficiently meaningful comparison. You need a run-time of about 2 seconds to get meaningful timings. The difference between 0.05 and 0.07 is caused by initializing the D rutime (like the D GC), it takes about 0.015 seconds on my systems at full speed CPU to initialize the D runtime, and it's a constant time. Bye, bearophile
Jul 09 2014
Larry:Of course when run on very high end machines, this diff is almost non existent but we want to run on very low powered hardware. Ok, even with a longer code, there will always be a launch penalty for d. So I cannot use it for very high performance loops.If you run it on very low powered hardware then you may not need the GC. So if you disable the run-time (stubbing out the GC) the start-up time of the D code will be smaller. I think people here like you are really too quick at dismissing D :-) Bye, bearophile
Jul 09 2014
On Wednesday, 9 July 2014 at 13:18:00 UTC, Larry wrote:On Wednesday, 9 July 2014 at 12:25:40 UTC, bearophile wrote:Could you provide the exact code you are using for that benchmark? Once the program has started up you should be able to obtain performance parity between C and D. Situations where this isn't true are problems we would like to know about. For the amount of work you are doing in the test program (almost nothing), the total runtime is probably dominated by the program load time etc. even when using C.Larry:You are definitely right, I did mess up while translating ! I run the corrected codes (the ones I was meant to provide :S) and on a slow macbook I end up with : C : 2 D : 15994 Of course when run on very high end machines, this diff is almost non existent but we want to run on very low powered hardware. Ok, even with a longer code, there will always be a launch penalty for d. So I cannot use it for very high performance loops. Shame for us.. :) Thanks and byeNow the performance : D : 12 µs C : < 1µs Where does the diff comes from ? Is there a way to optimize the d version ? Again, I am absolutely new to D and those are my very first line of code with it.Your C code is not equivalent to the D code, there are small differences, even the output is different. So I've cleaned up your C and D code: ------------------------ // C code. #include <stdio.h> #include <string.h> #include <stdlib.h> #include <time.h> #include <sys/time.h> #include "jol.h" int main() { struct timeval s, e; gettimeofday(&s, NULL); int pol = 5; tes(&pol); int arr[] = {9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215}; int len = 13 - 1; int g = 0; for (int x = 36; x >= 0; --x) { for (int y = len; y >= 0; --y) { ++g; arr[y]++; } } gettimeofday(&e, NULL); printf("C: %d %lu %d %d %d\n", g, e.tv_usec - s.tv_usec, arr[4], arr[9], pol); return 0; } ------------------------ D code ("final" functions have not much meaning, but the D compiler is very sloppy and doesn't complain): module jol; void tes(ref int a) { a = 9; } --------- module maind; void main() { import std.stdio; import std.datetime; import jol; StopWatch sw; sw.start; int pol = 5; tes(pol); int[] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215]; int len = 13 - 1; int g = 0; for (int x = 36; x >= 0; --x) { // Some code here erased for the test. for (int y = len; y >= 0; --y) { // Some other code here. ++g; arr[y]++; } } sw.stop; writefln("D: %d %d %d %d %d", g, sw.peek.nsecs, arr[4], arr[9], pol); } ---------------- That D code is not fully idiomatic, this is closer to idiomatic D code: module jol2; void test(ref int x) pure nothrow safe { x = 9; } module maind; void main() { import std.stdio, std.datetime; import jol2; StopWatch sw; sw.start; int pol = 5; test(pol); int[13] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215]; uint count = 0; foreach_reverse (immutable _; 0 .. 37) { foreach_reverse (ref ai; arr) { count++; ai++; } } sw.stop; writefln("D: %d %d %d %d %d", count, sw.peek.nsecs, arr[4], arr[9], pol); } ---------------- In my benchmarks I don't have used the more idiomatic D code, I have used the C-like code. But the run-time is essentially the same. I compile the C and D code with (on a 32 bit Windows): gcc -march=native -std=c11 -O2 main.c jol.c -o main ldmd2 -wi -O -release -inline -noboundscheck maind.d jol.d strip maind.exe For the D code I've used the latest ldc2 compiler (V. 0.13.0, based on DMD v2.064 and LLVM 3.4.2), GCC is V.4.8.0 (rubenvb-4.8.0). ---------------- The C code gives as ouput: C: 481 0 105 602 9 The D code gives as output: D: 481 6076 105 602 9 ---------------------- If I slow down the CPU at half speed the C code runs in about 0.05 seconds, the D code runs in about 0.07 seconds. Such run times are too much small to perform a sufficiently meaningful comparison. You need a run-time of about 2 seconds to get meaningful timings. The difference between 0.05 and 0.07 is caused by initializing the D rutime (like the D GC), it takes about 0.015 seconds on my systems at full speed CPU to initialize the D runtime, and it's a constant time. Bye, bearophile
Jul 09 2014
Yes you are perfectly right but our need is to run the fastest code on the lowest powered machines. Not servers but embedded systems. That is why I just test the overall structures. The rest of the code is numerical so it will not change by much the fact that d cannot get back the huge launching time. At the microsecond level(even nano) it counts because of electrical consumption, size of hardware, heat and so on. It is definitely not something most care about and i cannot disclose the full code for license reasons (yeah I know I suck and generate some fuss for nothing but.. I just execute.) But D may be of our use for non critical code to replace some Python there and there. It is definitely a good piece of engineering. And it will help save money.
Jul 09 2014
On Wednesday, 9 July 2014 at 13:46:59 UTC, Larry wrote:Yes you are perfectly right but our need is to run the fastest code on the lowest powered machines. Not servers but embedded systems. That is why I just test the overall structures. The rest of the code is numerical so it will not change by much the fact that d cannot get back the huge launching time. At the microsecond level(even nano) it counts because of electrical consumption, size of hardware, heat and so on. It is definitely not something most care about and i cannot disclose the full code for license reasons (yeah I know I suck and generate some fuss for nothing but.. I just execute.) But D may be of our use for non critical code to replace some Python there and there. It is definitely a good piece of engineering. And it will help save money.John Colvin : hem, you meant the sample code or the real code ? If the former, it is the one corrected by Bearophile. My excuses
Jul 09 2014
Larry:The rest of the code is numerical so it will not change by much the fact that d cannot get back the huge launching time. At the microsecond level(even nano) it counts because of electrical consumption, size of hardware, heat and so on.Have you benchmarked the D code without starting the current d-runtime (without GC)? Is a starting time of around 0.015 seconds on an old PC is a huge one? I think no one has worked a lot in decreasing this tiny time. If you care for such time, D being open source, you can take a look at the runtime starting code. Bye, bearophile
Jul 09 2014
Bearophile: just tried. No dramatic change. import core.memory; void main() { GC.disable; ... }
Jul 09 2014
Larry:Bearophile: just tried. No dramatic change. import core.memory; void main() { GC.disable; ... }That just means disabling the GC, so the start time is the same. What you want is to not start the GC/runtime, stubbing it out... (assuming you don't need the GC in your program). I think you can stub out the runtime functions defining few empty extern(C) functions, but I've never had to do it (saving 0.015 seconds is not important for my needs), so if you don't know how to do it, you have to ask to others. Bye, bearophile
Jul 09 2014
On Wednesday, 9 July 2014 at 13:46:59 UTC, Larry wrote:The rest of the code is numerical so it will not change by much the fact that d cannot get back the huge launching time. At the microsecond level(even nano) it counts because of electrical consumption, size of hardware, heat and so on.You say you are worried about microseconds and power consumption, but you are suggesting launching a new process - a lot of overhead - to do a small amount of numerical work. Surely no matter what programming language you use you would not want to work like this?
Jul 09 2014
On Wednesday, 9 July 2014 at 14:30:41 UTC, John Colvin wrote:You say you are worried about microseconds and power consumption, but you are suggesting launching a new process - a lot of overhead - to do a small amount of numerical work.Not much overhead if you don't use a MMU and use static linking.
Jul 09 2014
On Wednesday, 9 July 2014 at 14:30:41 UTC, John Colvin wrote:On Wednesday, 9 July 2014 at 13:46:59 UTC, Larry wrote:John : A new process ? Where ? Or maybe I got you wrong on this one John I am writing libraries and before going further I wondered if there were alternatives that I could have a grab on. The idea is to have an homogeneous software so we were ready to switch to d for the whole tasks/asset. No new process involved. I was seaking for maybe a python like programming language that offers c-like perfs, without so much writing as in c. Exit Cython. Debugging it is a real pain. And executable size is.. well.. I am becoming lazy and seek for the Holy Grail. Java not welcome. D seemed like a very good choice and maybe it is, or more certainly will.The rest of the code is numerical so it will not change by much the fact that d cannot get back the huge launching time. At the microsecond level(even nano) it counts because of electrical consumption, size of hardware, heat and so on.You say you are worried about microseconds and power consumption, but you are suggesting launching a new process - a lot of overhead - to do a small amount of numerical work. Surely no matter what programming language you use you would not want to work like this?
Jul 09 2014
On Wednesday, 9 July 2014 at 15:09:09 UTC, Larry wrote:On Wednesday, 9 July 2014 at 14:30:41 UTC, John Colvin wrote:I wouldn't give up on D (as you've already signalled). It's getting better with each iteration. BTW, have you measured the power consumption yet? Does it make a big difference if you use D or C?On Wednesday, 9 July 2014 at 13:46:59 UTC, Larry wrote:John : A new process ? Where ? Or maybe I got you wrong on this one John I am writing libraries and before going further I wondered if there were alternatives that I could have a grab on. The idea is to have an homogeneous software so we were ready to switch to d for the whole tasks/asset. No new process involved. I was seaking for maybe a python like programming language that offers c-like perfs, without so much writing as in c. Exit Cython. Debugging it is a real pain. And executable size is.. well.. I am becoming lazy and seek for the Holy Grail. Java not welcome. D seemed like a very good choice and maybe it is, or more certainly will.The rest of the code is numerical so it will not change by much the fact that d cannot get back the huge launching time. At the microsecond level(even nano) it counts because of electrical consumption, size of hardware, heat and so on.You say you are worried about microseconds and power consumption, but you are suggesting launching a new process - a lot of overhead - to do a small amount of numerical work. Surely no matter what programming language you use you would not want to work like this?
Jul 09 2014
On Wednesday, 9 July 2014 at 15:09:09 UTC, Larry wrote:On Wednesday, 9 July 2014 at 14:30:41 UTC, John Colvin wrote:process == program in this case. Launching a new process == running the program The startup cost of the D runtime is only paid when you start the program. If the amount of work done per execution of the program is more than a trivial amount then the startup cost will only be a small part of the total running time and power consumption etc.On Wednesday, 9 July 2014 at 13:46:59 UTC, Larry wrote:John : A new process ? Where ? Or maybe I got you wrong on this one JohnThe rest of the code is numerical so it will not change by much the fact that d cannot get back the huge launching time. At the microsecond level(even nano) it counts because of electrical consumption, size of hardware, heat and so on.You say you are worried about microseconds and power consumption, but you are suggesting launching a new process - a lot of overhead - to do a small amount of numerical work. Surely no matter what programming language you use you would not want to work like this?I am writing libraries and before going further I wondered if there were alternatives that I could have a grab on. The idea is to have an homogeneous software so we were ready to switch to d for the whole tasks/asset. No new process involved. I was seaking for maybe a python like programming language that offers c-like perfs, without so much writing as in c. Exit Cython. Debugging it is a real pain. And executable size is.. well.. I am becoming lazy and seek for the Holy Grail. Java not welcome. D seemed like a very good choice and maybe it is, or more certainly will.I think D could be a good choice for you.
Jul 09 2014
I may definitely help on the D project. I noticed that gdc doesn't have profile guided optimization too. So yeah, I cannot use D right now, I mean for this project. Ok, I will do my best to have some spare time on Dlang. Didn't really looked at the code already and I code for years in C, which is my first class coding language. Hope it will not be any kind of barrier (c++ is my.. third best coding buddy anyway (after python, excellent for managing systems)). Many thanks to all the community. I will stick with you and see what I can bring (or cannot). :) Bye
Jul 09 2014
Chris : Actually yes. If we consider the device to run 20h a day, by shaving a few microseconds there and there on billions of operations a day over a whole machine park, you can enable yourself to shut down some of them for maintenance more easily, or pause some of them letting their battery lasting a bit longer and economies have proven to be in the order of thousands $$ thanks to a redefined coding strategy. Not even mentionning hardware usage which is related to heat and savings you can pretend to have over a long run. By changing some hardware a few monthes after their theorical obsolescence, you can save a bit further. And the accountant is very happy because he can optimize the finance further (staggered repayment) It enabled us to hire more engineers/hardware. Of course, the saving is not only on this loop but on the whole chain. And it definitely adds up $$$. And there are a lot more things involved that benefit it (latency and so on). Yep. :)
Jul 09 2014
On Wednesday, 9 July 2014 at 13:18:00 UTC, Larry wrote:You are definitely right, I did mess up while translating ! I run the corrected codes (the ones I was meant to provide :S) and on a slow macbook I end up with : C : 2 D : 15994 Of course when run on very high end machines, this diff is almost non existent but we want to run on very low powered hardware. Ok, even with a longer code, there will always be a launch penalty for d. So I cannot use it for very high performance loops. Shame for us.. :) Thanks and byeThis to me pretty much confirms that almost the entirety of your C code is being optimized out and thus not actually executing.
Jul 09 2014
The actual code is not that much slower according to the numerous other operations we do. And certainly faster than D version doing almost nothing. Well it is about massive bitshifts and array accesses and calculations. With all the optimizations we are on par with fortran numerical code (thanks -std=c11). There may be an optimization hidden somewhere or just gdc having to mature. Dunno. But don't get me wrong, D is a fantastic language.
Jul 09 2014
On 07/09/2014 03:57 AM, Larry wrote:struct timeval s,e;[...]gettimeofday(&e,NULL); printf("so ? %d %lu %d %d %d",g,e.tv_usec - s.tv_usec, arr[4],arr[9],pol);Changing the topic a little, the calculation above ignores the tv_sec members of s and e. Ali
Jul 09 2014
On Wednesday, 9 July 2014 at 18:18:43 UTC, Ali Çehreli wrote:On 07/09/2014 03:57 AM, Larry wrote:Absolutely Ali because I know it is under the sec range. I made some test before submitting it :) But you are absolutely right Ali the mileage will vary in a completely different scenario.struct timeval s,e;[...]gettimeofday(&e,NULL); printf("so ? %d %lu %d %d %d",g,e.tv_usec - s.tv_usec, arr[4],arr[9],pol);Changing the topic a little, the calculation above ignores the tv_sec members of s and e. Ali
Jul 09 2014
On 07/09/2014 12:47 PM, Larry wrote:On Wednesday, 9 July 2014 at 18:18:43 UTC, Ali Çehreli wrote:I know it did work and will work every time you test it. :) However, even if the difference is just one millisecond, if s and e happen to be on different sides of a second boundary, you will get a huge result. AliOn 07/09/2014 03:57 AM, Larry wrote:Absolutely Ali because I know it is under the sec range. I made some test before submitting it :)struct timeval s,e;[...]gettimeofday(&e,NULL); printf("so ? %d %lu %d %d %d",g,e.tv_usec - s.tv_usec, arr[4],arr[9],pol);Changing the topic a little, the calculation above ignores the tv_sec members of s and e. Ali
Jul 09 2014
Measure a larger number of loops. I understand you're concerned about microseconds, but your benchmark shows nothing because your timer is simply not accurate enough for this. The benchmark that bearophile showed where C took ~2 nanoseconds vs the ~7000 D took heavily implies to me that the C implementation is simply being optimized out and nothing is actually running. All inputs are known at compile-time, the output is known at compile-time, the compiler is perfectly free to simply remove all your code and replace it with the result. I'm somewhat surprised that the D version doesn't do this actually, perhaps because of the dynamic memory allocation. I realize that you can't post your actual code, but this benchmark honestly just has too many flaws to determine anything from. As for startup cost, D will indeed have a higher startup cost than C because of static constructors. Once it's running, it should be very close. If you're looking to start a process that will run for only a few milliseconds, you'd probably want to not use D (or avoid most static constructors, including those in the runtime / standard library).
Jul 09 2014