digitalmars.D - [performance]PreInitializing is an annoyance
- Manfred Nowak (15/15) Jan 31 2005 I wrote about this before.
- Nick Sabalausky (7/23) Jan 31 2005 By the time we have systems that have >1TB RAM, I'm sure the memory bus
- Manfred Nowak (9/10) Jan 31 2005 The bus speeds can go as fast as they want. One cpu needs at least two
- Lionello Lunesu (11/27) Feb 01 2005 Hi..
- Manfred Nowak (12/16) Feb 01 2005 But that is not portable between OS's.
- pragma (19/25) Feb 01 2005 Possibly the most compelling fact, that D's arrays are not for huge chun...
- Dawid =?ISO-8859-2?Q?Ci=EA=BFarkiewicz?= (12/33) Feb 01 2005 I was reading documentation of D and found paragraph about initializatio...
- =?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= (18/29) Feb 01 2005 It's for the other 0.1% percent, where forgetting to initialize
- Norbert Nemec (9/11) Feb 01 2005 This definitely is not an excuse for any limitation in D. Assembler is
- =?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= (12/15) Feb 01 2005 I see D as a nice replacement for C++ in the (very) long run,
- Norbert Nemec (10/20) Feb 01 2005 For Java this is clear - it has a completely different objective, so D d...
- =?UTF-8?B?QW5kZXJzIEYgQmrDtnJrbHVuZA==?= (8/17) Feb 02 2005 Just some... And the languages are not *that* different, actually ?
- Norbert Nemec (4/13) Feb 02 2005 OK, I probably came on a bit too fast with my answer. C certainly has it...
- Walter (11/15) Feb 02 2005 Currently, the only cases where C fits better are:
- Norbert Nemec (4/10) Feb 02 2005 5) You want your code to look pretty:
- Walter (5/15) Feb 02 2005 You might want to check out:
-
Walter
(24/34)
Feb 02 2005
You can write "C" code in D. Take a look at the Empire source code
. - =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (18/37) Feb 02 2005 Yeah, you can hardly escape NullPointerErrors even in virtual machines.
- Dave (16/21) Feb 02 2005 FWIW, that's where I see D making it's biggest inroads initially with th...
- Dawid =?ISO-8859-2?Q?Ci=EA=BFarkiewicz?= (14/26) Feb 01 2005 I can agree that setting floats to NAN make sense, but setting forgotten...
- =?ISO-8859-2?Q?Anders_F_Bj=F6rklund?= (4/7) Feb 01 2005 Hard to say in the general case, I recommend playing with -O and disasm?
- Walter (8/16) Feb 02 2005 of
- Norbert Nemec (4/11) Feb 02 2005 'usually' is the point here. In certain cases, the compiler will not be ...
- Walter (6/16) Feb 02 2005 one
- Manfred Nowak (5/8) Feb 02 2005 The measure to b taken in the case I put in this discussion is the
- Norbert Nemec (15/36) Feb 01 2005 This issue should not need any measurements for justification. It is cle...
- pragma (6/15) Feb 01 2005 The nice thing about this is that pragmas in D are not be ignored if not
- Dawid =?ISO-8859-2?Q?Ci=EA=BFarkiewicz?= (5/24) Feb 02 2005 And you could always use "else" to initialize memory old way. I like it.
- Ben Hinkle (10/25) Feb 01 2005 The ironic part is that the GC itself has malloc but the D interface to ...
- J C Calvarese (7/38) Feb 01 2005 As look as we're doing wishful thinking, maybe we could just put a
- Norbert Nemec (4/6) Feb 01 2005 You don't really have to go for one 1TB to see the use of that option. W...
- Ivan Senji (14/20) Feb 02 2005 Write
- Georg Wrede (6/9) Feb 01 2005 Admittedly, I haven't followed processor specs for a while, but
- Brian Chapman (30/39) Feb 01 2005 No guys. Cycle counting is as old as MSDOS floppy disks. It's much
- Ben Hinkle (8/14) Feb 01 2005 D is aiming to support bare-metal programming, from what I gather from t...
- Brian Chapman (11/20) Feb 03 2005 Sorry, Ben. I was out of line with the Java comment. A little moment of
- Norbert Nemec (29/35) Feb 02 2005 Writing assembler to get performance is about as outdated as counting
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (11/19) Feb 02 2005 Of course, *reading* assembler is a good way to help write that good C
- Norbert Nemec (15/31) Feb 02 2005 Of course: if you want to exploit your compiler you have to know it, so
- =?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= (14/18) Feb 02 2005 I think the new GCC default on Mac OS X, -Os, is a fair trade-off ?
-
Dave
(13/17)
Feb 02 2005
In article
, - =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (4/6) Feb 02 2005 See http://gcc.gnu.org/develop.html#stage3, they're at the final stage.
- Norbert Nemec (7/19) Feb 02 2005 Have you tested the current floating point performance of gdc, compared ...
- Thomas Kuehne (17/17) Feb 02 2005 -----BEGIN PGP SIGNED MESSAGE-----
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (10/13) Feb 02 2005 On Mac OS X,
- Dave (12/28) Feb 02 2005 There has been some of that for floating point posted here (on the NG) a...
- zwang (5/39) Feb 02 2005 In general, rewriting C in assembly doesn't improve much,
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (4/8) Feb 02 2005 Then again, modern compilers can use those instructions too...
- Dave (7/15) Feb 02 2005 If you still have that code handy, would it be possible to run it throug...
- Norbert Nemec (10/28) Feb 02 2005 Already found out to my disappointment that I don't have it on my local
- Brian Chapman (27/59) Feb 03 2005 I disagree with this statement so very much I wouldn't even know where
- Georg Wrede (2/5) Feb 03 2005 Why?
- Brian Chapman (4/11) Feb 03 2005 Excellent question. All enlightenment begins with asking a good "why?"
- Georg Wrede (6/21) Feb 04 2005 Well, I at least hope the answer would be of general interest in
- Norbert Nemec (27/76) Feb 04 2005 I agree - I've discussed with several people on this and hardly ever cam...
- Kevin Bealer (74/74) Feb 01 2005 One potential solution: Formalize the existing (or semi-well known) met...
- Walter (17/31) Feb 02 2005 There are several ways to create an array. If the array is statically
- Vathix (2/6) Feb 02 2005 I like it, but will it work with 'new'? When newing arrays and value typ...
- Walter (12/19) Feb 02 2005 been
- Andy Friesen (6/25) Feb 02 2005 All this makes me think that the only thing that really needs to be done...
I wrote about this before. There is a well known time/space-tradeoff for the preInitialization of arrays: using about three times the space one can lazy initialize an array. But this technic is useless within D, because of the automaatic preInitialization, which currently eats up about 3 cycles per byte. Please awaken to, that on a 3GHz machine the busy preInitalization of one GB then lasts one second. And the coming 64-bit-machine will have up to some TB of main memory. Current mainboards can already hold up to 4GB, which ist the current main memory limit for win32. Check again, that to preInitialize one TB you have to wait more than 15 minutes only to wait at least another 15 minutes until your videoediting can start, if no precautions are taken. We need an option to switch automatic preInitialization of arrays off. -manfred
Jan 31 2005
By the time we have systems that have >1TB RAM, I'm sure the memory bus speeds will be much faster than they are now (And if they aren't, than there's a lot of other things besides initing arrays that would take insanely long as well). So I don't think it would take nearly as long as 15 minutes. But aside from that, you do raise an interesting point. "Manfred Nowak" <svv1999 hotmail.com> wrote in message news:ctmtne$2car$1 digitaldaemon.com...I wrote about this before. There is a well known time/space-tradeoff for the preInitialization of arrays: using about three times the space one can lazy initialize an array. But this technic is useless within D, because of the automaatic preInitialization, which currently eats up about 3 cycles per byte. Please awaken to, that on a 3GHz machine the busy preInitalization of one GB then lasts one second. And the coming 64-bit-machine will have up to some TB of main memory. Current mainboards can already hold up to 4GB, which ist the current main memory limit for win32. Check again, that to preInitialize one TB you have to wait more than 15 minutes only to wait at least another 15 minutes until your videoediting can start, if no precautions are taken. We need an option to switch automatic preInitialization of arrays off. -manfred
Jan 31 2005
"Nick Sabalausky" wrote: [...]memory bus speeds will be much faster than they are nowThe bus speeds can go as fast as they want. One cpu needs at least two cycles to store the next value: one cycle for incrementig the address and one to store the value, i.e. a 4GHZ cannot be faster than 0.5s for initializing one GB of RAM: still more than eight minutes for one TB. And in standard machinces the amount of RAM grew in the last ten years by the factor three more than the CPU-frequency. -manfred
Jan 31 2005
Hi.. I guess you can easily do an OS call is such a case: HeapAlloc (or even better: VirtualAlloc) in Win32. You should use these functions anyway for large blocks of data (dynamic arrays are not meant for that, never were). Or simply call malloc, it doesn't initialize the values either. This sure is better than any crt_init(bool) or whatever call you're thinking of to turn on/off array initializations. A compile-time option would be even worse: It'd would break a program if compiled with the wrong flag. Lionello. "Manfred Nowak" <svv1999 hotmail.com> wrote in message news:ctmtne$2car$1 digitaldaemon.com...I wrote about this before. There is a well known time/space-tradeoff for the preInitialization of arrays: using about three times the space one can lazy initialize an array. But this technic is useless within D, because of the automaatic preInitialization, which currently eats up about 3 cycles per byte. Please awaken to, that on a 3GHz machine the busy preInitalization of one GB then lasts one second. And the coming 64-bit-machine will have up to some TB of main memory. Current mainboards can already hold up to 4GB, which ist the current main memory limit for win32. Check again, that to preInitialize one TB you have to wait more than 15 minutes only to wait at least another 15 minutes until your videoediting can start, if no precautions are taken. We need an option to switch automatic preInitialization of arrays off. -manfred
Feb 01 2005
"Lionello Lunesu" wrote:I guess you can easily do an OS callBut that is not portable between OS's. [...]large blocks of data (dynamic arrays are not meant for that, never were).From where do you have this wisdom? If so please explain the prereqiesites for the usage of dynamic arrays and for fixed arrays as well.Or simply call malloc, it doesn't initialize the values either.[...] I would use malloc, if arrays at all are not to be used for large amounts of memory. Which seems to be a contradiction to the fact that memory cells are laid out like an array. -manfred
Feb 01 2005
In article <ctohtt$10mr$1 digitaldaemon.com>, Manfred Nowak says...[...]Possibly the most compelling fact, that D's arrays are not for huge chunks of memory, is that they rely on copy-on-write semantics. With a 1GB chunk of memory in a single D array, modifications to slices and concatenations will would result in a complete realloc and copy; so you could only use *one-half to a third* of your system's memory... and that's on a good day with agressive memory management and no GC. Also, it's ill-advised to use a single array directly for something like this simply due to the cache-misses that are likely to result from random-accesses on a 1GB structure; the same goes for virtually every language out there. To that end D sits frimly in the "trade more space for less running time" optimization camp, which is fine for the majority of tasks out there. Superscale blobs of data require a behavior that certainly can be expressed in D, but is not enshrined in it's underlying design. IMO, the optimal (all-round) solution for working with massive data structures, would approach the complexity of a memory manager and not a series of simple array manipulations. From there, you could implement an array-like interface to make it more friendly to use, but it would still be a far cry from a true array. - EricAnderton at yahoolarge blocks of data (dynamic arrays are not meant for that, never were).From where do you have this wisdom? If so please explain the prereqiesites for the usage of dynamic arrays and for fixed arrays as well.
Feb 01 2005
Manfred Nowak wrote:I wrote about this before. There is a well known time/space-tradeoff for the preInitialization of arrays: using about three times the space one can lazy initialize an array. But this technic is useless within D, because of the automaatic preInitialization, which currently eats up about 3 cycles per byte. Please awaken to, that on a 3GHz machine the busy preInitalization of one GB then lasts one second. And the coming 64-bit-machine will have up to some TB of main memory. Current mainboards can already hold up to 4GB, which ist the current main memory limit for win32. Check again, that to preInitialize one TB you have to wait more than 15 minutes only to wait at least another 15 minutes until your videoediting can start, if no precautions are taken. We need an option to switch automatic preInitialization of arrays off. -manfredI was reading documentation of D and found paragraph about initialization of variables. Can somebody tell me what is the point in doing so and if it's done just by setting them after creation (and IMHO wasting cpu cycles) or is it cost free (I don't know how would be that possible, but that is why I'm asking). Your post just remind me that case. I'm going even further. Why to initialize anything at all? In 99.9% cases variables are initialized one or two lines after creation and default isn't used at all. -- Dawid Ciê¿arkiewicz | arael jid: arael fov.pl
Feb 01 2005
Dawid Ciężarkiewicz wrote:Your post just remind me that case. I'm going even further. Why to initialize anything at all? In 99.9% cases variables are initialized one or two lines after creation and default isn't used at all.It's for the other 0.1% percent, where forgetting to initialize a variable causes a subtle bug ? In other languages, such as Objective-C for instance, these are separate events altogether:NewObject *newObject; // newObject will be an instance of the NewObject class newObject = [[NewObject alloc] init]; // create and initialize the object [newObject doSomethingWith: anotherObject];But in D, both will be performed when using the "new" keyword. http://www.digitalmars.com/d/class.html#constructors:Members are always initialized to the default initializer for their type, which is usually 0 for integer types and NAN for floating point types. This eliminates an entire class of obscure problems that come from neglecting to initialize a member in one of the constructors.Of course, in Objective-C you also have to retain/release or use Autorelease Pools, which is more fuss than D's garbage collection... (since it uses a simpler manual method of reference counting) You can still just kill it, of course, similar to "delete" in D:[newObject dealloc];Which has all of the double-free and dangling pointer fun, too. PreInitializing and GarbageCollecting are a whole lot easier to use. And for local variables, a reasonably good compiler should be able to optimize out the .init value, if it's just replaced right away... If not, then your code probably have other performance problems ;-) You can still write critical parts in C or even asm, and link them in ? (or write D code using C-standard functions or DMD inline X86 assembler) --anders
Feb 01 2005
Anders F Björklund wrote:You can still write critical parts in C or even asm, and link them in ? (or write D code using C-standard functions or DMD inline X86 assembler)This definitely is not an excuse for any limitation in D. Assembler is important for systems programming, but trying to beat a modern compiler with hand-written assembler will work only in very special cases. Simple code can usually be optimized be the compiler anyway and complicated code will become a mess written in assembler. For C - of course, one should be able to bind in existing C code, but in the long term, there should not be any reason to write new code in C if you have a D compiler at hand.
Feb 01 2005
Norbert Nemec wrote:For C - of course, one should be able to bind in existing C code, but in the long term, there should not be any reason to write new code in C if you have a D compiler at hand.I see D as a nice replacement for C++ in the (very) long run, but I will continue to use either C or Java when they fit better... But I agree that D is a bit strange in that it's pretty easy to e.g. dereference null, but hard to e.g. allocate uninited memory ? So far the performance has been good (just a few Mac OS X quirks still), and seems to be one of the key points of D. So it's right to address it. Maybe the auto initialization can be be replaced with an error if you try to actually use the value without setting it first ? Like in Java. (Java does D-style init of members, but only such errors for local vars. Not sure how much work it is for the compiler to catch such errors ?) --anders
Feb 01 2005
Anders F Björklund wrote:Norbert Nemec wrote:For Java this is clear - it has a completely different objective, so D does not even try to compete with in every respect. For C on the other hand, D should try to surpass it in every respect, so that there are no cases left, where C "fits better". This certainly is an ambitious goal that may never be reached completely, but nevertheless, it is a goal.For C - of course, one should be able to bind in existing C code, but in the long term, there should not be any reason to write new code in C if you have a D compiler at hand.I see D as a nice replacement for C++ in the (very) long run, but I will continue to use either C or Java when they fit better...Maybe the auto initialization can be be replaced with an error if you try to actually use the value without setting it first ? Like in Java.This would not make much difference. If the compiler is able to detect this error, it is also able to optimize away unnecessary initializations. This is simple in trivial cases but - I believe - impossible in general.
Feb 01 2005
Norbert Nemec wrote:Just some... And the languages are not *that* different, actually ? (just talking about the Java language, not the JVM or the religion)I see D as a nice replacement for C++ in the (very) long run, but I will continue to use either C or Java when they fit better...For Java this is clear - it has a completely different objective, so D does not even try to compete with in every respect.For C on the other hand, D should try to surpass it in every respect, so that there are no cases left, where C "fits better". This certainly is an ambitious goal that may never be reached completely, but nevertheless, it is a goal.I don't see either D or C++ as a replacement for regular C, more as a compliment? In my world, C is a more portable alternative to assembler. This does not mean I write everything in it (or in assembler, either) For me, D fits in nicely between the C and Java language "extremes"... --anders
Feb 02 2005
Anders F Björklund wrote:Norbert Nemec wrote:OK, I probably came on a bit too fast with my answer. C certainly has its uses. Personally, I don't use it at all, but then - everybody has a limited view of the world...For C on the other hand, D should try to surpass it in every respect, so that there are no cases left, where C "fits better". This certainly is an ambitious goal that may never be reached completely, but nevertheless, it is a goal.I don't see either D or C++ as a replacement for regular C, more as a compliment? In my world, C is a more portable alternative to assembler. This does not mean I write everything in it (or in assembler, either)
Feb 02 2005
"Norbert Nemec" <Norbert Nemec-online.de> wrote in message news:ctq038$2de3$1 digitaldaemon.com...For C on the other hand, D should try to surpass it in every respect, so that there are no cases left, where C "fits better". This certainly is an ambitious goal that may never be reached completely, but nevertheless, it is a goal.Currently, the only cases where C fits better are: 1) you need to work with existing C code 2) there isn't a D compiler for the target 3) you're working with a tool that generates C code 4) your staff is content using C and will not try anything else These are all environmental considerations, not language issues. It's faster to write code in D, faster to compile it and faster to debug it. If I'm missing something, if there is something that the C language is a better fit for, I'd like to know what it is!
Feb 02 2005
Walter wrote:Currently, the only cases where C fits better are: 1) you need to work with existing C code 2) there isn't a D compiler for the target 3) you're working with a tool that generates C code 4) your staff is content using C and will not try anything else5) You want your code to look pretty: http://www.de.ioccc.org/2004/anonymous.c :-)
Feb 02 2005
"Norbert Nemec" <Norbert Nemec-online.de> wrote in message news:ctrj9i$146d$1 digitaldaemon.com...Walter wrote:You might want to check out: <g>Currently, the only cases where C fits better are: 1) you need to work with existing C code 2) there isn't a D compiler for the target 3) you're working with a tool that generates C code 4) your staff is content using C and will not try anything else5) You want your code to look pretty: http://www.de.ioccc.org/2004/anonymous.c :-)
Feb 02 2005
"Anders F Björklund" <afb algonet.se> wrote in message news:ctomh9$14jr$1 digitaldaemon.com...I see D as a nice replacement for C++ in the (very) long run, but I will continue to use either C or Java when they fit better...You can write "C" code in D. Take a look at the Empire source code <g>.But I agree that D is a bit strange in that it's pretty easy to e.g. dereference null, but hard to e.g. allocate uninited memory ?It isn't strange viewed from the perspective that dereferencing null always generates a seg fault, and so cannot be ignored, overlooked, etc. Allocating uninitialized memory can lead to erratic, random behavior which sometimes can *appear* to work successfully, hence the idea that this is a bad thing that must be stamped out. Predictable, consistent behavior is what makes for robust, debuggable, error free programs.So far the performance has been good (just a few Mac OS X quirks still), and seems to be one of the key points of D. So it's right to address it.I wish to point out that DMDScript in D is faster than DMDScript in C++, despite the D version doing the automatic initialization (and the other safety features in D). The magic dust at work here is the D profiler and the ease in manipulating the D source code to make it faster. (D code, I've discovered, is easier than C++ to manipulate source to try to make it run faster. I spent a lot less time tuning the D code, and got better results.)Maybe the auto initialization can be be replaced with an error if you try to actually use the value without setting it first ? Like in Java.That only works well if you've got hardware support for it. The compiler cannot reliably determine this, though some compilers fake it and issue spurious and irritatingly wrong warnings when they get it wrong.(Java does D-style init of members, but only such errors for local vars. Not sure how much work it is for the compiler to catch such errors ?)The same techniques for catching such errors at compile time can be used instead to eliminate initializations that are not needed, which is done by DMD. I much prefer the latter approach, as when an initialization is redundant but such redundancy is not detectable, it "fails safe" by leaving the initialization in rather than issuing a nuisance error message.
Feb 02 2005
Walter wrote:Yeah, you can hardly escape NullPointerErrors even in virtual machines. Just meant that there are other languages that do more of hand-holding? D has this funny mix of low and high level, that takes a time of getting used to. But I like it :-) At least most of it, save a few rants... ;-)But I agree that D is a bit strange in that it's pretty easy toIt isn't strange viewed from the perspective that dereferencing null always generates a seg fault, and so cannot be ignored, overlooked, etc. Allocating uninitialized memory can lead to erratic, random behavior which sometimes can *appear* to work successfully, hence the idea that this is a bad thing that must be stamped out.e.g. dereference null, but hard to e.g. allocate uninited memory ?I wish to point out that DMDScript in D is faster than DMDScript in C++, despite the D version doing the automatic initialization (and the other safety features in D). The magic dust at work here is the D profiler and the ease in manipulating the D source code to make it faster. (D code, I've discovered, is easier than C++ to manipulate source to try to make it run faster. I spent a lot less time tuning the D code, and got better results.)The differences I'm seeing are mostly due to the fact that Apple has spent a lot of time tuning their compiler for C, Objective-C and C++ (and even the bastard child Objective-C++) but for D, I need to use the regular GCC which only has a few of those PowerPC tunings done... This gets even larger when using vector operations, on the PPC G4/G5. For DMD platforms, such as Win32 or Linux X86, this is not an issue. (or maybe less of an issue, as I don't how the SSE/MMX support is?)The same techniques for catching such errors at compile time can be used instead to eliminate initializations that are not needed, which is done by DMD. I much prefer the latter approach, as when an initialization is redundant but such redundancy is not detectable, it "fails safe" by leaving the initialization in rather than issuing a nuisance error message.It's also simpler to code with variables that start with a known value, rather than getting warnings later (even if they could be done reliable) I actually like that member fields are inited with known initializers, usally zeroes, and the locals will be optimized out anyway... Leaving large arrays and such, which is something that still can be addressed. --anders
Feb 02 2005
In article <ctraod$rm5$1 digitaldaemon.com>, Walter says..."Anders F Björklund" <afb algonet.se> wrote in message news:ctomh9$14jr$1 digitaldaemon.com...FWIW, that's where I see D making it's biggest inroads initially with the general programming community, especially now that Linux is surging and has a large number of fluent C programmers who are generally not forced into "vendor (or language or tool) tie-in". They'll use it a lot like C except maybe actually start to use OOP because D makes that very straight-forward ;) That's also why I personally tend to put a lot of stock in run-time performance; an easier/safer/gc'd "C" that generally performs as well at v1.0 (and potentially better in the future) sounds like a pretty darn good reason to risk a switch for me <g>. I mean the reason a systems programmer 'drops to C' and doesn't do everything in Perl or Python or whatever is because of performance. That's usually the most compelling reason anyway. Less than equal performance (or even perceived performance via common benchmark results) will also be an equally compelling reason for many C programmers to not try D ;) - DaveI see D as a nice replacement for C++ in the (very) long run, but I will continue to use either C or Java when they fit better...You can write "C" code in D. Take a look at the Empire source code <g>.
Feb 02 2005
Anders F Björklund wrote:Dawid Ciê¿arkiewicz wrote:I can agree that setting floats to NAN make sense, but setting forgotten int to arbitrary value won't help program so much. It will let him give same errors rather than random ones. :)Your post just remind me that case. I'm going even further. Why to initialize anything at all? In 99.9% cases variables are initialized one or two lines after creation and default isn't used at all.It's for the other 0.1% percent, where forgetting to initialize a variable causes a subtle bug ? In other languages, such as Objective-C for instance, these are separate events altogether:PreInitializing and GarbageCollecting are a whole lot easier to use.I agree that GrabageCollecting is *necessary* for modern computer programing language. And I agree that should be a way to disable it if needed. Just as with variable initialization - sometimes (in critical parts of code) it would be good to could disable this.And for local variables, a reasonably good compiler should be able to optimize out the .init value, if it's just replaced right away...This is part of the answer that I expected and I'm glad to hear, but what about variables that are not expected to have any value for start and are initialized later? -- Dawid Ciê¿arkiewicz | arael jid: arael fov.pl
Feb 01 2005
Dawid Ciê¿arkiewicz wrote:This is part of the answer that I expected and I'm glad to hear, but what about variables that are not expected to have any value for start and are initialized later?Hard to say in the general case, I recommend playing with -O and disasm? (if you use GDC, you can compare -O0...-O3, and get asm output with -S) --anders
Feb 01 2005
"Dawid Ciê¿arkiewicz" <arael fov.pl> wrote in message news:ctnv58$c53$1 digitaldaemon.com...I was reading documentation of D and found paragraph about initializationofvariables. Can somebody tell me what is the point in doing so and if it's done just by setting them after creation (and IMHO wasting cpu cycles) or is it cost free (I don't know how would be that possible, but that is why I'm asking).The point of it is to eliminate a common, and difficult to find, source of bugs.Your post just remind me that case. I'm going even further. Why to initialize anything at all? In 99.9% cases variables are initialized oneortwo lines after creation and default isn't used at all.In those cases, the optimizer will usually eliminate the initializer (since it is a "dead assignment").
Feb 02 2005
Walter wrote:'usually' is the point here. In certain cases, the compiler will not be able to determine that it actually is a dead assignment and leave it in. There should be a compiler pragma to tell the compiler about it in this case.Your post just remind me that case. I'm going even further. Why to initialize anything at all? In 99.9% cases variables are initialized oneortwo lines after creation and default isn't used at all.In those cases, the optimizer will usually eliminate the initializer (since it is a "dead assignment").
Feb 02 2005
"Norbert Nemec" <Norbert Nemec-online.de> wrote in message news:ctr980$pr9$1 digitaldaemon.com...oneYour post just remind me that case. I'm going even further. Why to initialize anything at all? In 99.9% cases variables are initializedableor'usually' is the point here. In certain cases, the compiler will not betwo lines after creation and default isn't used at all.In those cases, the optimizer will usually eliminate the initializer (since it is a "dead assignment").to determine that it actually is a dead assignment and leave it in. There should be a compiler pragma to tell the compiler about it in this case.I honestly think that in a non-trivial program, you'd be very, very hard pressed to see a measurable difference in program performance from this.
Feb 02 2005
"Walter" wrote: [...]I honestly think that in a non-trivial program, you'd be very, very hard pressed to see a measurable difference in program performance from this.The measure to b taken in the case I put in this discussion is the steadiness of the run by lazy initializing. This costly in total. -manfred
Feb 02 2005
This issue should not need any measurements for justification. It is clear that preinitializing causes some overhead. Personally, I believe there needs to be some way to deactivate it. One cannot expect the compiler to optimize away all unnecessary initializations. Especially for array, where initialization really becomes an issue, the initialization might not happen in one simple loop. As I understand the philosophy of D, it does allow the user to shoot himself in the foot if he really wants to. It is ok to default to a safe behaviour, but experts should be able to deactivate the safety measures by some explicit command. (A global compiler option is not a good idea! I has to be specified right in the code in some way. Any ideas for a possible syntax specifying "This variable should not be initialized"? It should be possible both for variables as well as for dynamically allocated memory or class members. Manfred Nowak wrote:I wrote about this before. There is a well known time/space-tradeoff for the preInitialization of arrays: using about three times the space one can lazy initialize an array. But this technic is useless within D, because of the automaatic preInitialization, which currently eats up about 3 cycles per byte. Please awaken to, that on a 3GHz machine the busy preInitalization of one GB then lasts one second. And the coming 64-bit-machine will have up to some TB of main memory. Current mainboards can already hold up to 4GB, which ist the current main memory limit for win32. Check again, that to preInitialize one TB you have to wait more than 15 minutes only to wait at least another 15 minutes until your videoediting can start, if no precautions are taken. We need an option to switch automatic preInitialization of arrays off. -manfred
Feb 01 2005
In article <ctoitr$11pk$1 digitaldaemon.com>, Norbert Nemec says...Any ideas for a possible syntax specifying "This variable should not be initialized"? It should be possible both for variables as well as for dynamically allocated memory or class members.This sounds like a job for a compiler pragma.pragma(noinit) int a; // a gets a random value now (just like C!). int[] b; pragma(noinit){ b = new int[1024*1024*1024] // b gets an uninitalized 1GB block. }The nice thing about this is that pragmas in D are not be ignored if not understood. So while compiler dependent, the code won't compile on D compilers that don't support 'noinit'. - EricAnderton at yahoo
Feb 01 2005
pragma wrote:In article <ctoitr$11pk$1 digitaldaemon.com>, Norbert Nemec says...And you could always use "else" to initialize memory old way. I like it. -- Dawid Ciê¿arkiewicz | arael jid: arael fov.plAny ideas for a possible syntax specifying "This variable should not be initialized"? It should be possible both for variables as well as for dynamically allocated memory or class members.This sounds like a job for a compiler pragma.pragma(noinit) int a; // a gets a random value now (just like C!). int[] b; pragma(noinit){ b = new int[1024*1024*1024] // b gets an uninitalized 1GB block. }The nice thing about this is that pragmas in D are not be ignored if not understood. So while compiler dependent, the code won't compile on D compilers that don't support 'noinit'. - EricAnderton at yahoo
Feb 02 2005
"Manfred Nowak" <svv1999 hotmail.com> wrote in message news:ctmtne$2car$1 digitaldaemon.com...I wrote about this before. There is a well known time/space-tradeoff for the preInitialization of arrays: using about three times the space one can lazy initialize an array. But this technic is useless within D, because of the automaatic preInitialization, which currently eats up about 3 cycles per byte. Please awaken to, that on a 3GHz machine the busy preInitalization of one GB then lasts one second. And the coming 64-bit-machine will have up to some TB of main memory. Current mainboards can already hold up to 4GB, which ist the current main memory limit for win32. Check again, that to preInitialize one TB you have to wait more than 15 minutes only to wait at least another 15 minutes until your videoediting can start, if no precautions are taken. We need an option to switch automatic preInitialization of arrays off. -manfredThe ironic part is that the GC itself has malloc but the D interface to it always clears the result. See src/phobox/internal/gc.d routine _d_newarrayi. It would be really nice to have the following added to the GC interface: void* malloc(size_t len) { return _gc.malloc(len); } Ah, to have something so close and yet so far away... And while I'm at it how about exposing _gc.realloc, _gc.free and _gc.capacity, too. oh, now I'm just dreaming I know. -Ben
Feb 01 2005
In article <ctokln$13jb$1 digitaldaemon.com>, Ben Hinkle says..."Manfred Nowak" <svv1999 hotmail.com> wrote in message news:ctmtne$2car$1 digitaldaemon.com...As look as we're doing wishful thinking, maybe we could just put a readable/writeable property in the GC module called: "noinit". After _d_newarrayi malloc's the memory, it checks "noinit" to see if it should initialize or not. The majority of programmers would never touch this setting. Those that have to clear 1 TB of memory have the option. jcc7I wrote about this before. There is a well known time/space-tradeoff for the preInitialization of arrays: using about three times the space one can lazy initialize an array. But this technic is useless within D, because of the automaatic preInitialization, which currently eats up about 3 cycles per byte. Please awaken to, that on a 3GHz machine the busy preInitalization of one GB then lasts one second. And the coming 64-bit-machine will have up to some TB of main memory. Current mainboards can already hold up to 4GB, which ist the current main memory limit for win32. Check again, that to preInitialize one TB you have to wait more than 15 minutes only to wait at least another 15 minutes until your videoediting can start, if no precautions are taken. We need an option to switch automatic preInitialization of arrays off. -manfredThe ironic part is that the GC itself has malloc but the D interface to it always clears the result. See src/phobox/internal/gc.d routine _d_newarrayi. It would be really nice to have the following added to the GC interface: void* malloc(size_t len) { return _gc.malloc(len); } Ah, to have something so close and yet so far away... And while I'm at it how about exposing _gc.realloc, _gc.free and _gc.capacity, too. oh, now I'm just dreaming I know. -Ben
Feb 01 2005
J C Calvarese wrote:The majority of programmers would never touch this setting. Those that have to clear 1 TB of memory have the option.You don't really have to go for one 1TB to see the use of that option. Write a routine that has a 1KB array locally on the stack and call that routine repeatedly...
Feb 01 2005
"Norbert Nemec" <Norbert Nemec-online.de> wrote in message news:ctq0t2$2e32$1 digitaldaemon.com...J C Calvarese wrote:WriteThe majority of programmers would never touch this setting. Those that have to clear 1 TB of memory have the option.You don't really have to go for one 1TB to see the use of that option.a routine that has a 1KB array locally on the stack and call that routine repeatedly...I had a situation like this (but more than 1KB) and it didn't seem to work that slow, but i am sure it would work faster if it wasn't initialized every time. Why not try to persuade Walter of some syntax that would allow to create uninitialized arrays? noinit int[100000] array; or any other form that could be used. This would ofcourse be an option and used only when you need it and are sure that you will initialize the data later.
Feb 02 2005
"Manfred Nowak" <svv1999 hotmail.com> wrote in messageAdmittedly, I haven't followed processor specs for a while, but initializing memory to zero should take about 1 cycle per 4 bytes, unless I'm totally confused. And on a 64 bit bus machine 1 cycle 8 bytes. (We're talking about long sequences of memory here, not small structs or single variables.) Does someone know this better?But this technic is useless within D, because of the automaatic preInitialization, which currently eats up about 3 cycles per byte.
Feb 01 2005
On 2005-02-01 16:17:17 -0600, Georg Wrede <georg.wrede nospam.org> said:No guys. Cycle counting is as old as MSDOS floppy disks. It's much better and more complicated than that anymore. First of all, memory is transfered from RAM to L1 cache in a cache-line fill. 32 or 64 byte lines depending on the architecture. Then to L2 (on chip) cache. Various RAM types (ie: DDR) and cache read/write strategies will greatly affect how fast this is. Wait states are the problem area. A processor can sit there doing nothing for many "cycles" waiting for the bus and memory controller to get their ass in gear. Now, Superscalar processors (pentium onward) can read an write in parallel as long as the data does not depend on the previous results. Clearing memory would be such a case. So two ore more write instructions could be paired and executed in one "cycle." And if SIMD instructions are being used, then were talking even more through put. There's just no way of counting anymore. It's an old practice that doesn't relate to current hardware anymore. For instance, on the PPC it's all about keeping data in the cache. You can have an other wise high "cycle count" of code but if it keeps cache misses to a minimum then it will out preform a piece of code optimized on "cycle counting." The question is how much throughput and bandwidth do you have? The processor and it's instruction set is not the issue. But all of this is irrelevant to me, because if you're wanting to do gigabyte bare-metal memory blits for video editing, it would be beyond me why you would be using D and expecting it to do what you want. Why not just use Java? That makes about as much sense. You need to know YOUR hardware and write bare-metal ASM to get what you need done if it's that vital. I don't use a can opener to peal a grapefruit just because it's brand new and shiny."Manfred Nowak" <svv1999 hotmail.com> wrote in messageAdmittedly, I haven't followed processor specs for a while, but initializing memory to zero should take about 1 cycle per 4 bytes, unless I'm totally confused. And on a 64 bit bus machine 1 cycle 8 bytes. (We're talking about long sequences of memory here, not small structs or single variables.)But this technic is useless within D, because of the automaatic preInitialization, which currently eats up about 3 cycles per byte.
Feb 01 2005
But all of this is irrelevant to me, because if you're wanting to do gigabyte bare-metal memory blits for video editing, it would be beyond me why you would be using D and expecting it to do what you want. Why not just use Java? That makes about as much sense. You need to know YOUR hardware and write bare-metal ASM to get what you need done if it's that vital.D is aiming to support bare-metal programming, from what I gather from the Major Goals section of http://www.digitalmars.com/d/overview.html: "Provide low level bare metal access as required" D and Java are lightyears apart in terms of bare-metal access. The existing way to get gobs of uninitialized memory is to call std.c.stdlib.malloc and manage the memory by hand. That's fine and dandy but we want more. What was that old Queen song? "I want it all and I want it now"? Sounds good to me. :-)
Feb 01 2005
On 2005-02-01 20:59:16 -0600, "Ben Hinkle" <ben.hinkle gmail.com> said:D is aiming to support bare-metal programming, from what I gather from the Major Goals section of http://www.digitalmars.com/d/overview.html: "Provide low level bare metal access as required" D and Java are lightyears apart in terms of bare-metal access. The existing way to get gobs of uninitialized memory is to call std.c.stdlib.malloc and manage the memory by hand. That's fine and dandy but we want more. What was that old Queen song? "I want it all and I want it now"? Sounds good to me. :-)Sorry, Ben. I was out of line with the Java comment. A little moment of blunt humor got the better of me. ;-) D most certainly can't be compared with Java in that (and many other) respects. That wasn't the intent of my point. I'm just saying anyone who thinks they should be able to do a "ubyte[1<<30] videoData;" and thinks it *should* be optimal or else "we need to fix the compiler" deserves the headache they're going to get. ;-) But I'm afraid malloc isn't the answer either. More like direct memory mapping, ie: mmap/mlock.
Feb 03 2005
Brian Chapman wrote:But all of this is irrelevant to me, because if you're wanting to do gigabyte bare-metal memory blits for video editing, it would be beyond me why you would be using D and expecting it to do what you want. Why not just use Java? That makes about as much sense. You need to know YOUR hardware and write bare-metal ASM to get what you need done if it's that vital.Writing assembler to get performance is about as outdated as counting cycles. As I said before: if the code is simple enough to write it in assembler, it is also simple enough for a reasonable compiler to optimize it to the same extent. To exploit the full power of a modern processor, you have to do the right amount of loop unrolling, loop fusing, command interlacing and so on. YOu have to play with the data layout in memory, perhaps chunking arrays into smaller pieces. There are several more techniques to use, when you want to make full use of pipelining, branch prediction, cache lines and so on. Languages like Fortran 95 would in principle allow to compiler to do all of this automatically (Some good implementations begin to emerge.) Doing all of it by hand in C results in complete spaghetti code, but it is possible if you know exactly what you are doing. (In a course we did, we eventually transformed one single loop into an equivalent of ~500 lines of highly optimized spaghetti. The result was ten times faster than the original and somewhere around 80% of the absolute theoretical limit of the processor. The result was still pure C and therefore completely portable. The performance was, of course, tuned to one specific architecture, but there were basically constants to adjust for tuning it for about any modern processor. Doing the same thing in assembler would probably not be much faster. (After you went from 8% to 80%, the remaining factor of 1.25 probably isn't worth the effort. 80% peak performance is already well beyond what people usually go for.) Furthermore, writing that kind of spaghetti code in C without getting an error in already needs a lot of discipline. Doing the same thing in assembler will probably land you in the next psychiatry...
Feb 02 2005
Norbert Nemec wrote:Writing assembler to get performance is about as outdated as counting cycles. As I said before: if the code is simple enough to write it in assembler, it is also simple enough for a reasonable compiler to optimize it to the same extent.Of course, *reading* assembler is a good way to help write that good C code and is also a great help when debugging without the source code? So I still think learning to read (and write too, just for compliment) assembler is relevant, just as I think C is... Lots of people disagree*.Doing the same thing in assembler would probably not be much faster. (After you went from 8% to 80%, the remaining factor of 1.25 probably isn't worth the effort. 80% peak performance is already well beyond what people usually go for.)You could be in for a surprise there, though. But I agree that writing assembly is now a lot harder these days, in the post-RISC CPU era... These days, assembler and C are more useful for generating *small* code? Major loop unrolling and load/store reordering are a pain to do in asm. --anders * = Those darn Quiche Eaters. C and ASM is for us Real Programmers. :-)
Feb 02 2005
Anders F Björklund wrote:Norbert Nemec wrote:Of course: if you want to exploit your compiler you have to know it, so reading assembler might be a good idea once in a while...Writing assembler to get performance is about as outdated as counting cycles. As I said before: if the code is simple enough to write it in assembler, it is also simple enough for a reasonable compiler to optimize it to the same extent.Of course, *reading* assembler is a good way to help write that good C code and is also a great help when debugging without the source code?Not really. In that specific example (which was typical for numerics) the algorithm was given. It was known that the calculation needed a certain number of floating point operations. Each processor has some physical limit of floating point operations per second that it could theoretically achieve under absolute optimum conditions. No code in the world will ever break this limit. If you reach 80% of it with plain C code, you know that using assembler cannot not give you much gain. No surprise possible as long as you stick to the same algorithm.Doing the same thing in assembler would probably not be much faster. (After you went from 8% to 80%, the remaining factor of 1.25 probably isn't worth the effort. 80% peak performance is already well beyond what people usually go for.)You could be in for a surprise there, though.These days, assembler and C are more useful for generating *small* code?Of course. Code-size was never a concern for me yet. I was only talking about performance. (Be aware though, that excessive code-bloat is bad for performance as well. The code-cache is limited as well, so excessive loop-unrolling will kill performance as well.)
Feb 02 2005
Norbert Nemec wrote:Of course. Code-size was never a concern for me yet. I was only talking about performance. (Be aware though, that excessive code-bloat is bad for performance as well. The code-cache is limited as well, so excessive loop-unrolling will kill performance as well.)I think the new GCC default on Mac OS X, -Os, is a fair trade-off ? It's same as -O2, without the excessive code-heavy optimizations... http://gcc.gnu.org/onlinedocs/gcc-3.3.5/gcc/Optimize-Options.html It's a good allround compiler setting. For systems that tune the output to the present computer, like Gentoo Linux, then other flags might be in order that more specifically target the CPU. But it's very hard to "optimize for the general case", which is why Just-In-Time compilers and recompiling from source code are popular ? Problem with assembler is that it just isn't portable enough today. But I think the *performance* of D and DMD is more than good enough. Right now I'm more concerned about the bugs and ever getting to "1.0" (and porting GDC to the new GCC 4.0, would also be very interesting) --anders
Feb 02 2005
In article <ctq7qb$2lmg$1 digitaldaemon.com>, =?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= says...<snip>But I think the *performance* of D and DMD is more than good enough. Right now I'm more concerned about the bugs and ever getting to "1.0" (and porting GDC to the new GCC 4.0, would also be very interesting)I agree, with the exception of DMD floating point which I hope will be given some attention before 1.0. It's important to me and I think will actually turn out to be important to the overall acceptance of the language (and certainly DMD) come 1.0. I'm not talking about new array semantics, vectorizing, expression templates or anything like what Norbert has been speaking of lately; just plain old for(...) { PI * 2.0 * radius[i]; ...; } type of stuff. BTW - How close are they getting with GCC 4? I have not been following that lately. - Dave--anders
Feb 02 2005
Dave wrote:BTW - How close are they getting with GCC 4? I have not been following that lately.See http://gcc.gnu.org/develop.html#stage3, they're at the final stage. Apple is going to use it as the main system compiler in next Mac OS X. --anders
Feb 02 2005
Dave wrote:In article <ctq7qb$2lmg$1 digitaldaemon.com>, =?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= says...Have you tested the current floating point performance of gdc, compared to gcc/g++? This would give a clue about whether it is a problem of the front end or the DM backend. Are there any general comparisons of the code produced by the different compilers? (Not only "What *does* work?", like in the stress test, but also "How well does is work?")<snip>But I think the *performance* of D and DMD is more than good enough. Right now I'm more concerned about the bugs and ever getting to "1.0" (and porting GDC to the new GCC 4.0, would also be very interesting)I agree, with the exception of DMD floating point which I hope will be given some attention before 1.0.
Feb 02 2005
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Norbert Nemec wrote: | Are there any general comparisons of the code produced by the | different compilers? (Not only "What *does* work?", like in the | stress test, but also "How well does is work?") Most of the comparisons are on the benchmark level http://www.prowiki.org/wiki4d/wiki.cgi?Benchmarks http://gcc.gnu.org/benchmarks/ http://shootout.alioth.debian.org/ I'm not aware of any current public compiler dissections. Thomas -----BEGIN PGP SIGNATURE----- iD8DBQFCAR4X3w+/yD4P9tIRAtAOAJ9ej/xFnhYO4wyhNxKiaOZwSLOScQCgiXtO YvOsTKmUg5z4brXrlrKozlo= =mDCr -----END PGP SIGNATURE-----
Feb 02 2005
Norbert Nemec wrote:Are there any general comparisons of the code produced by the different compilers? (Not only "What *does* work?", like in the stress test, but also "How well does is work?")On Mac OS X, most of it is like "hooray, it compiles" :-) Benchmarks hasn't been too bad, but currently GCC is quicker for most tasks (and then again gcc code generated on PPC is not all that good) When Mango compiles, and some of the more annoying D bugs like "void main()" are out, we can do some more testing. For now, DStress is a pretty good start... --anders
Feb 02 2005
In article <ctr5d5$lfd$1 digitaldaemon.com>, Norbert Nemec says...Dave wrote:There has been some of that for floating point posted here (on the NG) a while back -- oopack and scimark ported by Thomas Kuehn. IIRC, generally what it showed was that GDC significantly outperformed DMD and also in the case of scimark, that GDC actually performed a bit better than GCC and was very close to Intel, so the frontend doesn't appear to be the issue. My own experience is that DMD is very good for int. A good example of this is that the gc generally seems to run faster for DMD than GDC. For the FP that I'm familiar with (not cache dependent heavy-duty numerics), it looks to me like DMD just needs to make as good of use of the FP registers as it does with the GP registers ;) - DaveIn article <ctq7qb$2lmg$1 digitaldaemon.com>, =?ISO-8859-15?Q?Anders_F_Bj=F6rklund?= says...Have you tested the current floating point performance of gdc, compared to gcc/g++? This would give a clue about whether it is a problem of the front end or the DM backend.<snip>But I think the *performance* of D and DMD is more than good enough. Right now I'm more concerned about the bugs and ever getting to "1.0" (and porting GDC to the new GCC 4.0, would also be very interesting)I agree, with the exception of DMD floating point which I hope will be given some attention before 1.0.
Feb 02 2005
In general, rewriting C in assembly doesn't improve much, since modern compilers are good at optimizing general-purpose code. Where hand-tuned assembly can often boost the performance is with programs that may exploit MMX & SSE instructions. Norbert Nemec wrote:Writing assembler to get performance is about as outdated as counting cycles. As I said before: if the code is simple enough to write it in assembler, it is also simple enough for a reasonable compiler to optimize it to the same extent. To exploit the full power of a modern processor, you have to do the right amount of loop unrolling, loop fusing, command interlacing and so on. YOu have to play with the data layout in memory, perhaps chunking arrays into smaller pieces. There are several more techniques to use, when you want to make full use of pipelining, branch prediction, cache lines and so on. Languages like Fortran 95 would in principle allow to compiler to do all of this automatically (Some good implementations begin to emerge.) Doing all of it by hand in C results in complete spaghetti code, but it is possible if you know exactly what you are doing. (In a course we did, we eventually transformed one single loop into an equivalent of ~500 lines of highly optimized spaghetti. The result was ten times faster than the original and somewhere around 80% of the absolute theoretical limit of the processor. The result was still pure C and therefore completely portable. The performance was, of course, tuned to one specific architecture, but there were basically constants to adjust for tuning it for about any modern processor. Doing the same thing in assembler would probably not be much faster. (After you went from 8% to 80%, the remaining factor of 1.25 probably isn't worth the effort. 80% peak performance is already well beyond what people usually go for.) Furthermore, writing that kind of spaghetti code in C without getting an error in already needs a lot of discipline. Doing the same thing in assembler will probably land you in the next psychiatry...
Feb 02 2005
zwang wrote:In general, rewriting C in assembly doesn't improve much, since modern compilers are good at optimizing general-purpose code. Where hand-tuned assembly can often boost the performance is with programs that may exploit MMX & SSE instructions.Then again, modern compilers can use those instructions too... Here's for GDC's speedily porting to GCC 4.0, that does those. --anders
Feb 02 2005
In article <ctq2di$2gbd$1 digitaldaemon.com>, Norbert Nemec says...<snip>The result was still pure C and therefore completely portable. The performance was, of course, tuned to one specific architecture, but there were basically constants to adjust for tuning it for about any modern processor. Doing the same thing in assembler would probably not be much faster. (After you went from 8% to 80%, the remaining factor of 1.25 probably isn't worth the effort. 80% peak performance is already well beyond what people usually go for.)If you still have that code handy, would it be possible to run it through DMD and GDC and post the results vs., say, GCC (and Intel C/++ if it's available)? Just out of curiousity.. Thanks, - Dave
Feb 02 2005
Dave wrote:In article <ctq2di$2gbd$1 digitaldaemon.com>, Norbert Nemec says...Already found out to my disappointment that I don't have it on my local harddisk any more. Have to dig up some old backups... In any case, I would not expect very conclusive results. The code was plain ANSI C and did not depend on any compiler optimizations. Furthermore, it was tuned to a specific Alpha processor (which had a comparatively simple cache structure) The techniques were rather general, but the specifics were tuned exactly to that one machine which I don't have any access to any more. Anyhow: I'll try to dig up the code and see in which state it is.<snip>The result was still pure C and therefore completely portable. The performance was, of course, tuned to one specific architecture, but there were basically constants to adjust for tuning it for about any modern processor. Doing the same thing in assembler would probably not be much faster. (After you went from 8% to 80%, the remaining factor of 1.25 probably isn't worth the effort. 80% peak performance is already well beyond what people usually go for.)If you still have that code handy, would it be possible to run it through DMD and GDC and post the results vs., say, GCC (and Intel C/++ if it's available)?
Feb 02 2005
On 2005-02-02 02:18:42 -0600, Norbert Nemec <Norbert Nemec-online.de> said:Writing assembler to get performance is about as outdated as counting cycles.I disagree with this statement so very much I wouldn't even know where to begin. Granted it's all depends on the situation and I'm not talking about writing generalized code. It's pointless to continue this because the argument is as old as a PDP-11 rotting away in an MIT basement. We could debate this till were blue in the face I won't convince you to break out an assembler and you wont convince me that a compiler can (or should) do it better. Were just going to have to agree to disagree even though I don't even know what the point of this tread is supposed to be anymore.As I said before: if the code is simple enough to write it in assembler, it is also simple enough for a reasonable compiler to optimize it to the same extent. To exploit the full power of a modern processor, you have to do the right amount of loop unrolling, loop fusing, command interlacing and so on. YOu have to play with the data layout in memory, perhaps chunking arrays into smaller pieces. There are several more techniques to use, when you want to make full use of pipelining, branch prediction, cache lines and so on.Maybe. Or if you had a copy of the CPU's programmer's manual you could just inline a nice slick column of opcodes and do what you want exactly instead of crossing your fingers when you type make or spending all day with a profiler trying various C idioms to various results. I'd rather take the compilers assembly output, grumble once, rewrite it properly and inline it back in.Languages like Fortran 95 would in principle allow to compiler to do all of this automatically (Some good implementations begin to emerge.)Well you're most certainly never going to convince me to code in Fortran.Doing all of it by hand in C results in complete spaghetti code, but it is possible if you know exactly what you are doing. (In a course we did, we eventually transformed one single loop into an equivalent of ~500 lines of highly optimized spaghetti. The result was ten times faster than the original and somewhere around 80% of the absolute theoretical limit of the processor.500 line Duff devices don't impress me. They make me want to do everybody a big favor and promptly delete the last copy of the offending source file on the spot.The result was still pure C and therefore completely portable. The performance was, of course, tuned to one specific architecture, but there were basically constants to adjust for tuning it for about any modern processor. Doing the same thing in assembler would probably not be much faster. (After you went from 8% to 80%, the remaining factor of 1.25 probably isn't worth the effort. 80% peak performance is already well beyond what people usually go for.) Furthermore, writing that kind of spaghetti code in C without getting an error in already needs a lot of discipline. Doing the same thing in assembler will probably land you in the next psychiatry...Heh, you obviously don't know machine. If that's the kind of stuff you want to write, you go for it man. I'd rather just put some inline SIMD code in an asm block. I don't care how good you think your Fortran compiler or disciplined spegetti code is, it's never gonna know how to fill up all vector piplines to normalize 16 vectors for the price of 4 or fire off a DMA chain to blit gigabytes of data at max utilization. But it's a free world. You can loop unroll, fuse, and chunk arrays if you want.
Feb 03 2005
Brian Chapman wrote:...it's never gonna know how to fill up all vector piplines to normalize 16 vectors for the price of 4 or fire off a DMA chain to blit gigabytes of data at max utilization.Why?
Feb 03 2005
On 2005-02-03 09:59:01 -0600, Georg Wrede <georg.wrede nospam.org> said:Brian Chapman wrote:Excellent question. All enlightenment begins with asking a good "why?" If you really want to understand, I would invite you to start by reading some of the great information available at arstechnica.com....it's never gonna know how to fill up all vector piplines to normalize 16 vectors for the price of 4 or fire off a DMA chain to blit gigabytes of data at max utilization.Why?
Feb 03 2005
Brian Chapman wrote:On 2005-02-03 09:59:01 -0600, Georg Wrede <georg.wrede nospam.org> said:Thank you!Brian Chapman wrote:Excellent question. All enlightenment begins with asking a good "why?"...it's never gonna know how to fill up all vector piplines to normalize 16 vectors for the price of 4 or fire off a DMA chain to blit gigabytes of data at max utilization.Why?If you really want to understand, I would invite you to start by reading some of the great information available at arstechnica.com.Well, I at least hope the answer would be of general interest in this forum. Also, you seem to have a good idea of "why", based on the above quote. So, essentially a short(ish) answer would be appreciated.
Feb 04 2005
Brian Chapman wrote:On 2005-02-02 02:18:42 -0600, Norbert Nemec <Norbert Nemec-online.de> said:I agree - I've discussed with several people on this and hardly ever came to a conclusion. High-performace numerics experts would certainly agree on it. Old-school assemblists would never...Writing assembler to get performance is about as outdated as counting cycles.I disagree with this statement so very much I wouldn't even know where to begin. Granted it's all depends on the situation and I'm not talking about writing generalized code. It's pointless to continue this because the argument is as old as a PDP-11 rotting away in an MIT basement. We could debate this till were blue in the face I won't convince you to break out an assembler and you wont convince me that a compiler can (or should) do it better. Were just going to have to agree to disagree even though I don't even know what the point of this tread is supposed to be anymore.Well - do so, if you like to, just to realize that once you've spent hours optimizing your codeTo exploit the full power of a modern processor, you have to do the right amount of loop unrolling, loop fusing, command interlacing and so on. YOu have to play with the data layout in memory, perhaps chunking arrays into smaller pieces. There are several more techniques to use, when you want to make full use of pipelining, branch prediction, cache lines and so on.Maybe. Or if you had a copy of the CPU's programmer's manual you could just inline a nice slick column of opcodes and do what you want exactly instead of crossing your fingers when you type make or spending all day with a profiler trying various C idioms to various results. I'd rather take the compilers assembly output, grumble once, rewrite it properly and inline it back in.Me neither, that's why I would like to see the same features in D - so far, Fortran 95 is the only widely-spread language with that kind of performance.Languages like Fortran 95 would in principle allow to compiler to do all of this automatically (Some good implementations begin to emerge.)Well you're most certainly never going to convince me to code in Fortran.The algorithm was simple but nontrivial: solving partial differential equations. The original code was not stupidly coded, but just straightforward, as anyone would write it at the first shot unless they think of tricky issues of modern processor architecture. Back in the cycle counting times, the latter version would have been even slower, since it did many integer operations that the original did not need.Doing all of it by hand in C results in complete spaghetti code, but it is possible if you know exactly what you are doing. (In a course we did, we eventually transformed one single loop into an equivalent of ~500 lines of highly optimized spaghetti. The result was ten times faster than the original and somewhere around 80% of the absolute theoretical limit of the processor.500 line Duff devices don't impress me. They make me want to do everybody a big favor and promptly delete the last copy of the offending source file on the spot.Heh, you obviously don't know machine. If that's the kind of stuff you want to write, you go for it man. I'd rather just put some inline SIMD code in an asm block. I don't care how good you think your Fortran compiler or disciplined spegetti code is, it's never gonna know how to fill up all vector piplines to normalize 16 vectors for the price of 4 or fire off a DMA chain to blit gigabytes of data at max utilization.Why shouldn't it? As long as the compiler has the chance to reorder the instructions within certain constraints and has enough intelligence built-in to search for the optimum order it may do a pretty good job at crunching the numbers and find something quite efficient. The behaviour of the pipeline follows very strict rules that are different for each architecture. You put all the rules into a file and the compiler will optimize for a given architecture. Of course, this can only be done if the language gives the necessary flexibility. This is exactly the point why I believe that vectorized expressions in D are essential for high-performance computing.But it's a free world. You can loop unroll, fuse, and chunk arrays if you want.I don't care about doing that myself. I would like to teach it to a compiler.
Feb 04 2005
One potential solution: Formalize the existing (or semi-well known) method for "reserving space": ie. X.reserve(N) => X.length = N; X.length = old_length; By creating a real method called "reserve" the array could have its cake and eat it too: reserve would be required to allocate the memory, but NOT to clear it; that would STILL happen when the length was bumped up, but now it could be done lazy-style. Additional benefit: objects which override [] could also do reserve(), and could have special behaviour which is smarter than adjusting length() twice. In most cases, they would just pass the savings down by using reserve() instead of length() on *their* underlying data structure. I'm including a simple memory bandwidth meter quickie. Kevin : :private import std.date; :private import std.stdio; :private import std.conv; : :int main(char[][] args) :{ : long N = 100; : long MB = 1024*1024; : long Z1 = 64*MB; : long Z = Z1; : char[] p; : : if (args.length > 1) { : N = toInt(args[1]); : } : : if (args.length > 2) { : Z = toInt(args[2]) * MB; : } : : if (! N) { : N = 1; : } : : if (Z < 1024) { : Z = 256*MB; : } : : writef("Looping %s times.\n", N); : writef("Writing %s bytes/loop.\n", Z); : : d_time t1 = getUTCtime(); : : for(int i = 0; i<N; i++) { : p.length = Z; : if (p[p.length / 3] == 'c') { : writef("Have C\n"); : } : : if (p[p.length - 1] == 'q') { : writef("Have Q\n"); : } : : p[p.length / 3] = 'c'; : p[p.length - 1] = 'q'; : : p.length = 1234; : } : : d_time t2 = getUTCtime(); : : double sec = ((t2-t1) + 0.0)/TicksPerSecond; : : writef("Time elapsed = %s [res=%s/s].\n", : sec, TicksPerSecond); : : writef("Mem b/w = %s MB / sec.\n", ((Z/MB)*N)/sec); : : return 0; :} :
Feb 01 2005
"Manfred Nowak" <svv1999 hotmail.com> wrote in message news:ctmtne$2car$1 digitaldaemon.com...I wrote about this before. There is a well known time/space-tradeoff for the preInitialization of arrays: using about three times the space one can lazy initialize an array. But this technic is useless within D, because of the automaatic preInitialization, which currently eats up about 3 cycles per byte. Please awaken to, that on a 3GHz machine the busy preInitalization of one GB then lasts one second. And the coming 64-bit-machine will have up to some TB of main memory. Current mainboards can already hold up to 4GB, which ist the current main memory limit for win32. Check again, that to preInitialize one TB you have to wait more than 15 minutes only to wait at least another 15 minutes until your videoediting can start, if no precautions are taken. We need an option to switch automatic preInitialization of arrays off.There are several ways to create an array. If the array is statically initialized, it is initialized when it is demand paged in. There is no code generated to initialize it (in fact, there is no way to prevent this from happening!) Next, one can allocate arrays on the stack. These are normally initialized at runtime, but this can be turned off using the idiom outlined in www.digitalmars.com/d/memory.html#uninitializedarrays. And lastly, one can dynamically allocate arrays using new, in which case they are initialized, or using std.c.stdlib.malloc, in which case they are not, or any other allocator one wishes to use. P.S. there's no way to allocate a TB on the stack anyway <g> P.P.S. it's been suggested that the special initializer syntax: = void; mean "I know what I'm doing, don't initialize the variable" and I've been considering implementing it.
Feb 02 2005
P.P.S. it's been suggested that the special initializer syntax: = void; mean "I know what I'm doing, don't initialize the variable" and I've been considering implementing it.I like it, but will it work with 'new'? When newing arrays and value types one might also not want to initialize.
Feb 02 2005
"Vathix" <vathix dprogramming.com> wrote in message news:opslk5o5vckcck4r esi...beenP.P.S. it's been suggested that the special initializer syntax: = void; mean "I know what I'm doing, don't initialize the variable" and I'veNo.considering implementing it.I like it, but will it work with 'new'?When newing arrays and value types one might also not want to initialize.True, but I don't think that's a good idea. The cases where initialization of an array *might* make a difference (the critical path in a program tends to be only in a small part of it) are so unusual it is not worth upsetting new. And frankly, uninitialized garbage in gc allocated data can cause problems with the mark/sweep algorithm, and would pull the rug out from doing a future type-aware gc. Use std.c.malloc for allocating uninitialized arrays; if it must be new'd, instead use a wrapper class that malloc's/free's an internal private array.
Feb 02 2005
Walter wrote:There are several ways to create an array. If the array is statically initialized, it is initialized when it is demand paged in. There is no code generated to initialize it (in fact, there is no way to prevent this from happening!) Next, one can allocate arrays on the stack. These are normally initialized at runtime, but this can be turned off using the idiom outlined in www.digitalmars.com/d/memory.html#uninitializedarrays. And lastly, one can dynamically allocate arrays using new, in which case they are initialized, or using std.c.stdlib.malloc, in which case they are not, or any other allocator one wishes to use. P.S. there's no way to allocate a TB on the stack anyway <g> P.P.S. it's been suggested that the special initializer syntax: = void; mean "I know what I'm doing, don't initialize the variable" and I've been considering implementing it.All this makes me think that the only thing that really needs to be done is for this to be added to the FAQ. D's current behaviour is more or less ideal as it stands: uninitialized memory can be acquired without fuss, but it won't ever be done by accident. -- andy
Feb 02 2005