digitalmars.D.learn - Loop optimization
- kai (18/18) May 13 2010 Hello,
- Lars T. Kyllingstad (31/51) May 13 2010 Two suggestions:
- Lars T. Kyllingstad (31/51) May 13 2010 Two suggestions:
- Lars T. Kyllingstad (3/16) May 13 2010 Hmm.. something very strange is going on with the line breaking here.
- Steven Schveighoffer (5/11) May 14 2010 -release implies -noboundscheck (in fact, I did not know there was a
- Lars T. Kyllingstad (7/24) May 14 2010 You are right, just checked it now. But it's strange, I thought the
- bearophile (117/130) May 14 2010 Using floating point for indexes and lengths is not a good practice. In ...
- strtr (5/6) May 14 2010 NaNs (that's the default initalization of FP values in D), and operation...
- Don (7/15) May 15 2010 Yes, nan and inf are usually the same speed. However, it's very CPU
- Don (6/10) May 15 2010 More precisely:
- Walter Bright (6/18) May 16 2010 Have to be careful when talking about floating point optimizations. For ...
- bearophile (42/45) May 17 2010 I have done a little experiment, compiling this D1 code with LDC:
- Walter Bright (16/19) May 17 2010 In my view, such switches are bad news, because:
- bearophile (7/9) May 17 2010 The Intel compiler, Microsoft compiler, GCC and LLVM have a similar swit...
- Walter Bright (3/10) May 18 2010 If I agreed with everything other vendors did with their compilers, I wo...
- Don (8/32) May 17 2010 The most glaring limitation of the FP optimiser is that it seems to
- BCS (5/12) May 17 2010 Does DMD have the ground work for doing FP keyhole optimizations? That s...
- Walter Bright (7/8) May 16 2010 This is simply false. DMD does an excellent job with integer and pointer...
- bearophile (5/9) May 16 2010 You are of course right, I understand your feelings, I am a stupid -.-
- Brad Roberts (11/23) May 16 2010 While it's false that DMD doesn't do many optimizations. It's true that...
- Joseph Wakeling (7/12) May 19 2010 Interesting to note, relative to my earlier experience with D vs. C++ sp...
- Steven Schveighoffer (24/45) May 14 2010 I figured it out.
- kai (11/26) May 14 2010 Unfortunately, I don't think I will be able to. The actual code is
- bearophile (11/14) May 14 2010 LDC is D1 still, mostly :-(
- =?windows-1252?Q?=22J=E9r=F4me_M=2E_Berger=22?= (21/43) May 14 2010 ot of work to do still. And some parts of D design will need to be impro...
- div0 (18/21) May 15 2010 -----BEGIN PGP SIGNED MESSAGE-----
- =?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= (16/32) May 15 2010 According to the C89 standard and onwards it *must* be initialized
- div0 (15/26) May 16 2010 -----BEGIN PGP SIGNED MESSAGE-----
- Jouko Koski (10/15) May 16 2010 No, in C++ all *global or static* variables are zero-initialized. By
- =?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= (10/26) May 16 2010 The specs haven't diverged and C++ has mostly the same behaviour as
- Steven Schveighoffer (6/10) May 17 2010 You are probably right. All I did to figure this out is print out the
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (5/7) May 15 2010 I've discovered that this is the equivalent of the last line above:
- Simen kjaeraas (7/13) May 15 2010 Looks unintended to me. In fact (though that might be the
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (5/20) May 15 2010 I have to make a correction: It works with fixed-sized arrays. It does
- bearophile (5/6) May 15 2010 It's a compiler bug, don't use that bracket less syntax in your programs...
- Walter Bright (4/19) May 21 2010 for (int j=0;j<1e6-1;j++)
- bearophile (6/10) May 22 2010 The syntax "1e6" can represent an integer value of one million as perfec...
Hello, I was evaluating using D for some numerical stuff. However I was surprised to find that looping & array indexing was not very speedy compared to alternatives (gcc et al). I was using the DMD2 compiler on mac and windows, with -O -release. Here is a boiled down test case: void main (string[] args) { double [] foo = new double [cast(int)1e6]; for (int i=0;i<1e3;i++) { for (int j=0;j<1e6-1;j++) { foo[j]=foo[j]+foo[j+1]; } } } Any ideas? Am I somehow not hitting a vital compiler optimization? Thanks for your help.
May 13 2010
On Fri, 14 May 2010 02:38:40 +0000, kai wrote:Hello, I was evaluating using D for some numerical stuff. However I was surprised to find that looping & array indexing was not very speedy compared to alternatives (gcc et al). I was using the DMD2 compiler on mac and windows, with -O -release. Here is a boiled down test case: void main (string[] args) { double [] foo = new double [cast(int)1e6]; for (inti=0;i<1e3;i++){ for (int j=0;j<1e6-1;j++) { foo[j]=foo[j]+foo[j+1]; } } } Any ideas? Am I somehow not hitting a vital compiler optimization? Thanks for your help.Two suggestions: 1. Have you tried the -noboundscheck compiler switch? Unlike C, D checks that you do not try to read/write beyond the end of an array, but you can turn those checks off with said switch. 2. Can you use vector operations? If the example you gave is representative of your specific problem, then you can't because you are adding overlapping parts of the array. But if you are doing operations on separate arrays, then array operations will be *much* faster. http://www.digitalmars.com/d/2.0/arrays.html#array-operations As an example, compare the run time of the following code with the example you gave: void main () { double[] foo = new double [cast(int)1e6]; double[] slice1 = foo[0 .. 999_998]; double[] slice2 = foo[1 .. 999_999]; for (int i=0;i<1e3;i++) { // BAD, BAD, BAD. DON'T DO THIS even though // it's pretty awesome: slice1[] += slice2[]; } } Note that this is very bad code, since slice1 and slice2 are overlapping arrays, and there is no guarantee as to which order the array elements are computed -- it may even occur in parallel. It was just an example of the speed gains you may expect from designing your code with array operations in mind. -Lars
May 13 2010
On Fri, 14 May 2010 02:38:40 +0000, kai wrote:Hello, I was evaluating using D for some numerical stuff. However I was surprised to find that looping & array indexing was not very speedy compared to alternatives (gcc et al). I was using the DMD2 compiler on mac and windows, with -O -release. Here is a boiled down test case: void main (string[] args) { double [] foo = new double [cast(int)1e6]; for (inti=0;i<1e3;i++){ for (int j=0;j<1e6-1;j++) { foo[j]=foo[j]+foo[j+1]; } } } Any ideas? Am I somehow not hitting a vital compiler optimization? Thanks for your help.Two suggestions: 1. Have you tried the -noboundscheck compiler switch? Unlike C, D checks that you do not try to read/write beyond the end of an array, but you can turn those checks off with said switch. 2. Can you use vector operations? If the example you gave is representative of your specific problem, then you can't because you are adding overlapping parts of the array. But if you are doing operations on separate arrays, then array operations will be *much* faster. http://www.digitalmars.com/d/2.0/arrays.html#array-operations As an example, compare the run time of the following code with the example you gave: void main () { double[] foo = new double [cast(int)1e6]; double[] slice1 = foo[0 .. 999_998]; double[] slice2 = foo[1 .. 999_999]; for (int i=0;i<1e3;i++) { // BAD, BAD, BAD. DON'T DO THIS even though // it's pretty awesome: slice1[] += slice2[]; } } Note that this is very bad code, since slice1 and slice2 are overlapping arrays, and there is no guarantee as to which order the array elements are computed -- it may even occur in parallel. It was just an example of the speed gains you may expect from designing your code with array operations in mind. -Lars
May 13 2010
On Fri, 14 May 2010 06:31:29 +0000, Lars T. Kyllingstad wrote:void main () { double[] foo = new double [cast(int)1e6]; double[] slice1 = foo[0 .. 999_998]; double[] slice2 = foo[1 .. 999_999]; for (int i=0;i<1e3;i++) { // BAD, BAD, BAD. DON'T DO THIS even though // it's pretty awesome: slice1[] += slice2[]; } }Hmm.. something very strange is going on with the line breaking here. -Lars
May 13 2010
On Fri, 14 May 2010 02:31:29 -0400, Lars T. Kyllingstad <public kyllingen.nospamnet> wrote:On Fri, 14 May 2010 02:38:40 +0000, kai wrote:-release implies -noboundscheck (in fact, I did not know there was a noboundscheck flag, I thought you had to use -release). -SteveI was using the DMD2 compiler on mac and windows, with -O -release.1. Have you tried the -noboundscheck compiler switch? Unlike C, D checks that you do not try to read/write beyond the end of an array, but you can turn those checks off with said switch.
May 14 2010
On Fri, 14 May 2010 07:32:54 -0400, Steven Schveighoffer wrote:On Fri, 14 May 2010 02:31:29 -0400, Lars T. Kyllingstad <public kyllingen.nospamnet> wrote:You are right, just checked it now. But it's strange, I thought the whole point of the -noboundscheck switch was that it would be independent of -release. But perhaps I remember wrongly (or perhaps Walter just hasn't gotten around to it yet). Anyway, sorry for the misinformation. -LarsOn Fri, 14 May 2010 02:38:40 +0000, kai wrote:-release implies -noboundscheck (in fact, I did not know there was a noboundscheck flag, I thought you had to use -release). -SteveI was using the DMD2 compiler on mac and windows, with -O -release.1. Have you tried the -noboundscheck compiler switch? Unlike C, D checks that you do not try to read/write beyond the end of an array, but you can turn those checks off with said switch.
May 14 2010
kai:I was evaluating using D for some numerical stuff.For that evaluation you probably have to use the LDC compiler, that is able to optimize better.void main (string[] args) { double [] foo = new double [cast(int)1e6]; for (int i=0;i<1e3;i++) { for (int j=0;j<1e6-1;j++) { foo[j]=foo[j]+foo[j+1]; } } }Using floating point for indexes and lengths is not a good practice. In D large numbers are written like 1_000_000. Use -release too.Any ideas? Am I somehow not hitting a vital compiler optimization?DMD compiler doesn't perform many optimizations, especially on floating point computations. But the bigger problem in your code is that you are performing operations on NaNs (that's the default initalization of FP values in D), and operations on NaNs are usually quite slower. Your code in C: #include "stdio.h" #include "stdlib.h" #define N 1000000 int main() { double *foo = calloc(N, sizeof(double)); // malloc suffices here int i, j; for (j = 0; j < N; j++) foo[j] = 1.0; for (i = 0; i < 1000; i++) for (j = 0; j < N-1; j++) foo[j] = foo[j] + foo[j + 1]; printf("%f", foo[N-1]); return 0; } /* gcc -O3 -s -Wall test.c -o test Timings, outer loop=1_000 times: 7.72 ------------------ gcc -Wall -O3 -fomit-frame-pointer -msse3 -march=native test.c -o test (Running on a VirtualBox) Timings, outer loop=1_000 times: 7.69 s Just the inner loop: .L7: fldl 8(%edx) fadd %st, %st(1) fxch %st(1) fstpl (%edx) addl $8, %edx cmpl %ecx, %edx jne .L7 */ -------------------- Your code in D1: version (Tango) import tango.stdc.stdio: printf; else import std.c.stdio: printf; void main() { const int N = 1_000_000; double[] foo = new double[N]; foo[] = 1.0; for (int i = 0; i < 1_000; i++) for (int j = 0; j < N-1; j++) foo[j] = foo[j] + foo[j + 1]; printf("%f", foo[N-1]); } /* dmd -O -release -inline test.d (Not running on a VirtualBox) Timings, outer loop=1_000 times: 9.35 s Just the inner loop: L34: fld qword ptr 8[EDX*8][ECX] fadd qword ptr [EDX*8][ECX] fstp qword ptr [EDX*8][ECX] inc EDX cmp EDX,0F423Fh jb L34 ----------------------- ldc -O3 -release -inline test.d (Running on a VirtualBox) Timings, outer loop=1_000 times: 7.87 s Just the inner loop: .LBB1_2: movsd (%eax,%ecx,8), %xmm0 addsd 8(%eax,%ecx,8), %xmm0 movsd %xmm0, (%eax,%ecx,8) incl %ecx cmpl $999999, %ecx jne .LBB1_2 ----------------------- ldc -unroll-allow-partial -O3 -release -inline test.d (Running on a VirtualBox) Timings, outer loop=1_000 times: 7.75 s Just the inner loop: .LBB1_2: movsd (%eax,%ecx,8), %xmm0 addsd 8(%eax,%ecx,8), %xmm0 movsd %xmm0, (%eax,%ecx,8) movsd 8(%eax,%ecx,8), %xmm0 addsd 16(%eax,%ecx,8), %xmm0 movsd %xmm0, 8(%eax,%ecx,8) movsd 16(%eax,%ecx,8), %xmm0 addsd 24(%eax,%ecx,8), %xmm0 movsd %xmm0, 16(%eax,%ecx,8) movsd 24(%eax,%ecx,8), %xmm0 addsd 32(%eax,%ecx,8), %xmm0 movsd %xmm0, 24(%eax,%ecx,8) movsd 32(%eax,%ecx,8), %xmm0 addsd 40(%eax,%ecx,8), %xmm0 movsd %xmm0, 32(%eax,%ecx,8) movsd 40(%eax,%ecx,8), %xmm0 addsd 48(%eax,%ecx,8), %xmm0 movsd %xmm0, 40(%eax,%ecx,8) movsd 48(%eax,%ecx,8), %xmm0 addsd 56(%eax,%ecx,8), %xmm0 movsd %xmm0, 48(%eax,%ecx,8) movsd 56(%eax,%ecx,8), %xmm0 addsd 64(%eax,%ecx,8), %xmm0 movsd %xmm0, 56(%eax,%ecx,8) movsd 64(%eax,%ecx,8), %xmm0 addsd 72(%eax,%ecx,8), %xmm0 movsd %xmm0, 64(%eax,%ecx,8) addl $9, %ecx cmpl $999999, %ecx jne .LBB1_2 */ As you see the code generated by ldc is about as good as the one generated by gcc. There are of course other ways to optimize this code... Bye, bearophile
May 14 2010
== Quote from bearophile (bearophileHUGS lycos.com)'s articleBut the bigger problem in your code is that you are performing operations onNaNs (that's the default initalization of FP values in D), and operations on NaNs are usually quite slower. I didn't know that. Is it the same for inf? I used it as a null for structs.
May 14 2010
strtr wrote:== Quote from bearophile (bearophileHUGS lycos.com)'s articleYes, nan and inf are usually the same speed. However, it's very CPU dependent, and even *within* a CPU! On Pentium 4, for example, for x87, nan is 200 times slower than a normal value (!), but on Pentium 4 SSE there's no speed difference at all between nan and normal. I think there's no speed difference on AMD, but I'm not sure. There's almost no documentation on it at all.But the bigger problem in your code is that you are performing operations onNaNs (that's the default initalization of FP values in D), and operations on NaNs are usually quite slower. I didn't know that. Is it the same for inf?I used it as a null for structs.
May 15 2010
== Quote from Don (nospam nospam.com)'s articlestrtr wrote:Thanks! NaNs being slower I can understand but inf might well be a value you want to use.== Quote from bearophile (bearophileHUGS lycos.com)'s articleYes, nan and inf are usually the same speed. However, it's very CPU dependent, and even *within* a CPU! On Pentium 4, for example, for x87, nan is 200 times slower than a normal value (!), but on Pentium 4 SSE there's no speed difference at all between nan and normal. I think there's no speed difference on AMD, but I'm not sure. There's almost no documentation on it at all.But the bigger problem in your code is that you are performing operations onNaNs (that's the default initalization of FP values in D), and operations on NaNs are usually quite slower. I didn't know that. Is it the same for inf?I used it as a null for structs.
May 15 2010
strtr wrote:== Quote from Don (nospam nospam.com)'s articleYes. What's happened is that none of the popular programming languages support special IEEE values, so they're given very low priority by chip designers. In the Pentium 4 case, they're implemented entirely in microcode. A 200X slowdown is really significant. However, the bit pattern for NaN is 0xFFFF..., which is the same as a negative integer, so an uninitialized floating-point variable has a quite high probability of being a NaN. I'm certain there's a lot of C programs out there which are inadvertantly using NaNs.strtr wrote:Thanks! NaNs being slower I can understand but inf might well be a value you want to use.== Quote from bearophile (bearophileHUGS lycos.com)'s articleYes, nan and inf are usually the same speed. However, it's very CPU dependent, and even *within* a CPU! On Pentium 4, for example, for x87, nan is 200 times slower than a normal value (!), but on Pentium 4 SSE there's no speed difference at all between nan and normal. I think there's no speed difference on AMD, but I'm not sure. There's almost no documentation on it at all.But the bigger problem in your code is that you are performing operations onNaNs (that's the default initalization of FP values in D), and operations on NaNs are usually quite slower. I didn't know that. Is it the same for inf?
May 15 2010
bearophile wrote:kai:More precisely: In terms of optimizations performed, DMD isn't too far behind gcc. But it performs almost no optimization on floating point. Also, the inliner doesn't yet support the newer D features (this won't be hard to fix) and the scheduler is based on Pentium1.Any ideas? Am I somehow not hitting a vital compiler optimization?DMD compiler doesn't perform many optimizations, especially on floating point computations.
May 15 2010
Don wrote:bearophile wrote:Have to be careful when talking about floating point optimizations. For example, x/c => x * 1/c is not done because of roundoff error. Also, 0 * x => 0 is also not done because it is not a correct replacement if x is a NaN.kai:More precisely: In terms of optimizations performed, DMD isn't too far behind gcc. But it performs almost no optimization on floating point. Also, the inliner doesn't yet support the newer D features (this won't be hard to fix) and the scheduler is based on Pentium1.Any ideas? Am I somehow not hitting a vital compiler optimization?DMD compiler doesn't perform many optimizations, especially on floating point computations.
May 16 2010
Walter Bright:is not done because of roundoff error. Also, 0 * x => 0 is also not done because it is not a correct replacement if x is a NaN.I have done a little experiment, compiling this D1 code with LDC: import tango.stdc.stdio: printf; void main(char[][] args) { double x = cast(double)args.length; double y = 0 * x; printf("%f\n", y); } I think the asm generated by ldc shows what you say: ldc -O3 -release -inline -output-s test _Dmain: pushl %ebp movl %esp, %ebp andl $-16, %esp subl $32, %esp movsd .LCPI1_0, %xmm0 movd 8(%ebp), %xmm1 orps %xmm0, %xmm1 subsd %xmm0, %xmm1 pxor %xmm0, %xmm0 mulsd %xmm1, %xmm0 movsd %xmm0, 4(%esp) movl $.str, (%esp) call printf xorl %eax, %eax movl %ebp, %esp popl %ebp ret $8 So I have added an extra "unsafe floating point" optimization: ldc -O3 -release -inline -enable-unsafe-fp-math -output-s test _Dmain: subl $12, %esp movl $0, 8(%esp) movl $0, 4(%esp) movl $.str, (%esp) call printf xorl %eax, %eax addl $12, %esp ret $8 GCC has similar switches. Bye, bearophile
May 17 2010
bearophile wrote:So I have added an extra "unsafe floating point" optimization: ldc -O3 -release -inline -enable-unsafe-fp-math -output-s testIn my view, such switches are bad news, because: 1. very few people understand the issues regarding wrong floating point optimizations 2. even those that do, are faced with a switch that doesn't really define what unsafe fp optimizations it is doing, so there's no way to tell how it affects their code 3. the behavior of such a switch may change over time, breaking one's carefully written code 4. most of those optimizations can be done by hand if you want to, meaning that then their behavior will be reliable, portable and correct for your application 5. in my experience with such switches, almost nobody uses them, and the few that do use them wrongly 6. they add clutter, complexity, confusion and errors to the documentation 7. they use it, their code doesn't work correctly, they blame the compiler/language and waste the time of the tech support people
May 17 2010
Walter Bright:In my view, such switches are bad news, because:<The Intel compiler, Microsoft compiler, GCC and LLVM have a similar switch (fp:fast in the Microsoft compiler, -ffast-math on GCC, etc). So you might send your list of comments to the devs of each of those four compilers. I have used the "unsafe fp" switch in LDC to run faster my small raytracers, with good results. So I use it now and then where max precision is not important and small errors are not going to ruin the output. I have asked the LLVM head developer to improve this optimization on LLVM, because in my opinion it's not aggressive enough, to put LLVM on par with GCC. So LDC too will probably get better on this, in future. This unsafe optimization is off on default, so if you don't like it you can avoid it. Its presence in LDC has caused zero problems to me so far in LDC (because when I need safer/more precise results I don't use it).4. most of those optimizations can be done by hand if you want to, meaning that then their behavior will be reliable, portable and correct for your application<This is true for any optimization. Bye, bearophile
May 17 2010
bearophile wrote:Walter Bright:If I agreed with everything other vendors did with their compilers, I wouldn't have built my own <g>.In my view, such switches are bad news, because:<The Intel compiler, Microsoft compiler, GCC and LLVM have a similar switch (fp:fast in the Microsoft compiler, -ffast-math on GCC, etc). So you might send your list of comments to the devs of each of those four compilers.
May 18 2010
Walter Bright wrote:Don wrote:The most glaring limitation of the FP optimiser is that it seems to never keep values in the FP stack. So that it will often do: FSTP x FLD x instead of FST x Fixing this would probably give a speedup of ~20% on almost all FP code, and would unlock the path to further optimisation.bearophile wrote:Have to be careful when talking about floating point optimizations. For example, x/c => x * 1/c is not done because of roundoff error. Also, 0 * x => 0 is also not done because it is not a correct replacement if x is a NaN.kai:More precisely: In terms of optimizations performed, DMD isn't too far behind gcc. But it performs almost no optimization on floating point. Also, the inliner doesn't yet support the newer D features (this won't be hard to fix) and the scheduler is based on Pentium1.Any ideas? Am I somehow not hitting a vital compiler optimization?DMD compiler doesn't perform many optimizations, especially on floating point computations.
May 17 2010
Hello Don,The most glaring limitation of the FP optimiser is that it seems to never keep values in the FP stack. So that it will often do: FSTP x FLD x instead of FST x Fixing this would probably give a speedup of ~20% on almost all FP code, and would unlock the path to further optimisation.Does DMD have the ground work for doing FP keyhole optimizations? That sound like an easy one. -- ... <IXOYE><
May 17 2010
bearophile wrote:DMD compiler doesn't perform many optimizations,This is simply false. DMD does an excellent job with integer and pointer operations. It does a so-so job with floating point. There are probably over a thousand optimizations at all levels that dmd does with integer and pointer code. Compare the generated code with and without -O. Even without -O, dmd does a long list of optimizations (such as common subexpression elimination).
May 16 2010
Walter Bright:This is simply false. DMD does an excellent job with integer and pointer operations. It does a so-so job with floating point. There are probably over a thousand optimizations at all levels that dmd does with integer and pointer code.You are of course right, I understand your feelings, I am a stupid -.- I must be more precise in my posts. You are right that surely dmd performs numerous optimizations. What I meant to say was a comparison with other compilers, particularly ldc. And even then generic words about a generic comparison aren't useful. So I am sorry. Bye, bearophile
May 16 2010
On 5/16/2010 4:15 PM, Walter Bright wrote:bearophile wrote:While it's false that DMD doesn't do many optimizations. It's true that it's behind more modern compiler optimizers. I've been working to fix some of the grossly bad holes in dmd's inliner which is one are that's just obviously lacking (see bug 2008). But gcc and ldc (and likely msvc though I lack any direct knowledge) are simply a decade or so ahead. It's not a criticism of dmd or a suggestion that the priorities are in the wrong place, just a point of fact. They've got larger teams of people and are spending significant time on just improving and adding optimizations. Later, BradDMD compiler doesn't perform many optimizations,This is simply false. DMD does an excellent job with integer and pointer operations. It does a so-so job with floating point. There are probably over a thousand optimizations at all levels that dmd does with integer and pointer code. Compare the generated code with and without -O. Even without -O, dmd does a long list of optimizations (such as common subexpression elimination).
May 16 2010
On 05/17/2010 01:15 AM, Walter Bright wrote:bearophile wrote:Interesting to note, relative to my earlier experience with D vs. C++ speed: http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=digitalmars.D.learn&artnum=19567 I'll have to try and put together a no-floating-point bit of code to make a comparison. Best wishes, -- JoeDMD compiler doesn't perform many optimizations,This is simply false. DMD does an excellent job with integer and pointer operations. It does a so-so job with floating point.
May 19 2010
On Thu, 13 May 2010 22:38:40 -0400, kai <kai nospam.zzz> wrote:Hello, I was evaluating using D for some numerical stuff. However I was surprised to find that looping & array indexing was not very speedy compared to alternatives (gcc et al). I was using the DMD2 compiler on mac and windows, with -O -release. Here is a boiled down test case: void main (string[] args) { double [] foo = new double [cast(int)1e6]; for (int i=0;i<1e3;i++) { for (int j=0;j<1e6-1;j++) { foo[j]=foo[j]+foo[j+1]; } } } Any ideas? Am I somehow not hitting a vital compiler optimization? Thanks for your help.I figured it out. in D, the default value for doubles is nan, so you are adding countless scores of nan's which is costly for some reason (not a big floating point guy, so I'm not sure about this). In C/C++, the default value for doubles is 0. BTW, without any initialization of the array, what are you expecting the code to do? In the C++ version, I suspect you are simply adding a bunch of 0s together. Equivalent D code which first initializes the array to 0s: void main (string[] args) { double [] foo = new double [cast(int)1e6]; foo[] = 0; // probably want to change this to something more meaningful for (int i=0;i<cast(int)1e3;i++) { for (int j=0;j<cast(int)1e6-1;j++) { foo[j]+=foo[j+1]; } } } On my PC, it runs almost exactly at the same speed as the C++ version. -Steve
May 14 2010
Thanks for the help all!2. Can you use vector operations? If the example you gave is representative of your specific problem, then you can't because you are adding overlapping parts of the array. But if you are doing operations on separate arrays, then array operations will be *much* faster.Unfortunately, I don't think I will be able to. The actual code is computing norms of a sequence of points and then updating their values as needed (MLE smoothing/prediction).For that evaluation you probably have to use the LDC compiler, that is able to optimize better.I was scared off by the warning that D 2.0 support is experimental. I realize D 2 itself is still non-production, but for academic interests industrial-strength isnt all that important if it usually works :).Using floating point for indexes and lengths is not a good practice. In D large numbers are written like 1_000_000. Use -release too.Good to know, thanks (thats actually a great feature for scientists!).DMD compiler doesn't perform many optimizations, especially on floating point computations. But the bigger problem in your code is that you are performing operations on NaNs (that's the default initalization of FP values in D), and operations on NaNs are usually quite slower.in D, the default value for doubles is nan, so you are adding countless scores of nan's which is costly for some reason (not a big floating point guy, so I'm not sure about this).Ah ha, that was it-- serves me right for trying to boil down a test case and failing miserably. I'll head back to my code now and try to find the real problem :-) At some point I removed the initialization data obviously.
May 14 2010
kai:I was scared off by the warning that D 2.0 support is experimental.LDC is D1 still, mostly :-( And at the moment it uses LLVM 2.6. LLVM 2.7 contains a new optimization that can improve that code some more.Good to know, thanks (thats actually a great feature for scientists!).In theory D is a bit fit for numerical computations too, but there is lot of work to do still. And some parts of D design will need to be improved to help numerical code performance. From my extensive tests, if you use it correctly, D1 code compiled with LDC can be about as efficient as C code compiled with GCC or sometimes a little more efficient. ------------- Steven Schveighoffer:In C/C++, the default value for doubles is 0.I think in C and C++ the default value for doubles is "uninitialized" (that is anything). Bye, bearophile
May 14 2010
bearophile wrote:kai: =20re.I was scared off by the warning that D 2.0 support is experimental.=20 LDC is D1 still, mostly :-( And at the moment it uses LLVM 2.6. LLVM 2.7 contains a new optimization that can improve that code some mo==20 =20Good to know, thanks (thats actually a great feature for scientists!).==20 In theory D is a bit fit for numerical computations too, but there is l=ot of work to do still. And some parts of D design will need to be improv= ed to help numerical code performance.=20 From my extensive tests, if you use it correctly, D1 code compiled with=LDC can be about as efficient as C code compiled with GCC or sometimes a= little more efficient.=20 ------------- =20 Steven Schveighoffer:that is anything).In C/C++, the default value for doubles is 0.=20 I think in C and C++ the default value for doubles is "uninitialized" (==20That depends. In C/C++, the default value for any global variable is to have all bits set to 0 whatever that means for the actual data type. The default value for local variables and malloc/new memory is "whatever was in this place in memory before" which can be anything. The default value for calloc is to have all bits to 0 as for global variables. In the OP code, the malloc will probably return memory that has never been used before, therefore probably initialized to 0 too (OS dependent). Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
May 14 2010
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jérôme M. Berger wrote:That depends. In C/C++, the default value for any global variable is to have all bits set to 0 whatever that means for the actual data type.No it's not, it's always uninitialized. Visual studio will initialise memory & a functions stack segment with 0xcd, but only in debug builds. In release mode you get what was already there. That used to be the case with gcc (which used 0xdeadbeef) as well unless they've changed it. - -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.uk -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFL7qNxT9LetA9XoXwRAnApAJ9rSzMN9dy1mxMFdBzASaESlkpvCQCfTRWO GlaukVSRKe3prjs/jXe73CU= =tgCi -----END PGP SIGNATURE-----
May 15 2010
div0 wrote:J=E9r=F4me M. Berger wrote:According to the C89 standard and onwards it *must* be initialized to 0. If it isn't then your implementation isn't standard compliant (needless to say, gcc, Visual, llvm, icc and dmc are all standard compliant, so you won't have any difficulty checking).That depends. In C/C++, the default value for any global variable is to have all bits set to 0 whatever that means for the actual data type.=20=20 No it's not, it's always uninitialized. =20Visual studio will initialise memory & a functions stack segment with 0xcd, but only in debug builds. In release mode you get what was alread=ythere. That used to be the case with gcc (which used 0xdeadbeef) as wel=lunless they've changed it. =20This does not concern global variables. Therefore the second part of my message applies, the part you didn't quote:The default value for local variables and malloc/new memory is "whatever was in this place in memory before" which can be anything. The default value for calloc is to have all bits to 0 as for global variables.I should have added that some compiler / standard libraries allow you to have a default initialization value for debugging purpose. Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
May 15 2010
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jérôme M. Berger wrote:div0 wrote:Ah, I only do C++, where the standard is to not initialise. I didn't know the two specs had diverged like that. - -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.uk -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFL7/wlT9LetA9XoXwRAtiuAKCsbvt0KXymdZV4SBNG2lMRB9MM6QCgo9pm qGbY++2jGP9W/lELsnq47Zs= =8KpC -----END PGP SIGNATURE-----Jérôme M. Berger wrote:According to the C89 standard and onwards it *must* be initialized to 0. If it isn't then your implementation isn't standard compliant (needless to say, gcc, Visual, llvm, icc and dmc are all standard compliant, so you won't have any difficulty checking).That depends. In C/C++, the default value for any global variable is to have all bits set to 0 whatever that means for the actual data type.No it's not, it's always uninitialized.
May 16 2010
"div0" <div0 users.sourceforge.net> wrote:No, in C++ all *global or static* variables are zero-initialized. By default, stack variables are default-initialized, which means that doubles in stack can have any value (they are uninitialized). The C-function calloc is required to fill the newly allocated memory with zero bit pattern; malloc is not required to initialize anything. Fresh heap areas given by malloc may have zero bit pattern, but one should really make no assumptions on this. -- JoukoAh, I only do C++, where the standard is to not initialise.Jérôme M. Berger wrote:That depends. In C/C++, the default value for any global variable is to have all bits set to 0 whatever that means for the actual data type.
May 16 2010
div0 wrote:J=E9r=F4me M. Berger wrote:div0 wrote:J=E9r=F4me M. Berger wrote:That depends. In C/C++, the default value for any global variable is to have all bits set to 0 whatever that means for the actual data=The specs haven't diverged and C++ has mostly the same behaviour as C where global variables are concerned. The only difference is that if the global variable is a class with a constructor, then that constructor gets called after the memory is zeroed out. Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr=20 Ah, I only do C++, where the standard is to not initialise. I didn't know the two specs had diverged like that. =20According to the C89 standard and onwards it *must* be initialized to 0. If it isn't then your implementation isn't standard compliant (needless to say, gcc, Visual, llvm, icc and dmc are all standard compliant, so you won't have any difficulty checking).type.=20No it's not, it's always uninitialized.
May 16 2010
On Fri, 14 May 2010 12:40:52 -0400, bearophile <bearophileHUGS lycos.com> wrote:Steven Schveighoffer:You are probably right. All I did to figure this out is print out the first element of the array in my C++ version of kai's code. So it may be arbitrarily set to 0. -SteveIn C/C++, the default value for doubles is 0.I think in C and C++ the default value for doubles is "uninitialized" (that is anything).
May 17 2010
Steven Schveighoffer wrote:double [] foo = new double [cast(int)1e6]; foo[] = 0;I've discovered that this is the equivalent of the last line above: foo = 0; I don't see it in the spec. Is that an old or an unintended feature? Ali
May 15 2010
Ali =C3=87ehreli <acehreli yahoo.com> wrote:Steven Schveighoffer wrote: > double [] foo =3D new double [cast(int)1e6]; > foo[] =3D 0; I've discovered that this is the equivalent of the last line above: foo =3D 0; I don't see it in the spec. Is that an old or an unintended feature?Looks unintended to me. In fact (though that might be the C programmer in me doing the thinking), it looks to me like foo =3D null;. It might be related to the discussion in digitalmars.D "Is [] mandatory for array operations?". -- = Simen
May 15 2010
Simen kjaeraas wrote:Ali Çehreli <acehreli yahoo.com> wrote:I have to make a correction: It works with fixed-sized arrays. It does not work with the dynamic array initialization above.Steven Schveighoffer wrote: > double [] foo = new double [cast(int)1e6]; > foo[] = 0; I've discovered that this is the equivalent of the last line above: foo = 0; I don't see it in the spec. Is that an old or an unintended feature?Looks unintended to me. In fact (though that might be the C programmer in me doing the thinking), it looks to me like foo = null;. It might be related to the discussion in digitalmars.D "Is [] mandatory for array operations?".Thanks, Ali
May 15 2010
Ali Çehreli:I don't see it in the spec. Is that an old or an unintended feature?It's a compiler bug, don't use that bracket less syntax in your programs. Don is fighting to fix such problems (and I have written several posts and bug reports on that stuff). Bye, bearophile
May 15 2010
kai wrote:Here is a boiled down test case: void main (string[] args) { double [] foo = new double [cast(int)1e6]; for (int i=0;i<1e3;i++) { for (int j=0;j<1e6-1;j++) { foo[j]=foo[j]+foo[j+1]; } } } Any ideas?for (int j=0;j<1e6-1;j++) The j<1e6-1 is a floating point operation. It should be redone as an int one: j<1_000_000-1
May 21 2010
Walter Bright:for (int j=0;j<1e6-1;j++) The j<1e6-1 is a floating point operation. It should be redone as an int one: j<1_000_000-1The syntax "1e6" can represent an integer value of one million as perfectly and as precisely as "1_000_000", but traditionally in many languages the exponential syntax is used to represent floating point values only, I don't know why. If the OP wants a short syntax to represent one million, this syntax can be used in D2: foreach (j; 0 .. 10^^6) Bye, bearophile
May 22 2010