digitalmars.D - Q: fast stack allocation?
- Kris (9/9) May 30 2004 How does one allocate an array on the stack without incurring the wrath ...
- Walter (6/14) May 30 2004 a
- Kris (10/28) May 30 2004 alloca() still does a < rep movs dword ptr [edi],dword ptr [esi] >, albe...
- Walter (7/13) May 30 2004 a
- Kris (51/64) May 30 2004 I have absolutely no argument with your point regarding uninitialized
- Arcane Jill (3/6) May 30 2004 inline assembler?
- Kris (7/14) May 30 2004 Yeah, I didn't dare mention that one AJ 'cos it seems ridiculous to go t...
- Walter (16/62) May 30 2004 Copying a few bytes that are on the stack are in the cache already.
- Kris (16/21) May 30 2004 a
- Kris (2/3) May 30 2004 Errrr ... perhaps one bug (should have checked first).
- Matthew (1/5) Jun 02 2004 Some of them are positively filthy! (More power to your elbow.)
- Norbert Nemec (20/24) Jun 02 2004 Implicit initialization does not make programs more reliable at all. If ...
- Matthew (6/30) Jun 04 2004 And another long, and eventually ignored, thread begins. I wish you the ...
- J Anderson (6/94) May 30 2004 If performace is an issue, generally I try to pre-allocate the array.
- Kris (11/14) May 31 2004 Me too, and seriously so. For example, mango.server and mango.servlet
- Norbert Nemec (13/15) Jun 01 2004 If you forget to initialize a variable that should be initialized, that ...
How does one allocate an array on the stack without incurring the wrath of a "clear everything to zero" loop? I've tried the obvious, such as char[Size] array; and char[] array = (cast(char *) alloca(Size))[0..Size]; both of which insist on clearing the content to zero. I appreciate deterministic state as much as the next person, but there are times when it gets in the way of execution speed ... - Kris
May 30 2004
"Kris" <someidiot earthlink.dot.dot.dot.net> wrote in message news:c9c1l3$1tfp$1 digitaldaemon.com...How does one allocate an array on the stack without incurring the wrath ofa"clear everything to zero" loop? I've tried the obvious, such as char[Size] array; and char[] array = (cast(char *) alloca(Size))[0..Size]; both of which insist on clearing the content to zero. I appreciate deterministic state as much as the next person, but there are times whenitgets in the way of execution speed ...char[] array; array = (cast(char *) alloca(Size))[0..Size];
May 30 2004
alloca() still does a < rep movs dword ptr [edi],dword ptr [esi] >, albeit a shorter loop than < char[Size] array > emits. Surely there must be a simple means of circumventing all this overhead? I mean, I'm allocating on the stack so as to avoid heap management: yet still have all this to contend with. Whatever happened to <sub sp, Size> type stack allocation? There's still a real need for that ... - Kris "Walter" <newshound digitalmars.com> wrote in message news:c9c7p0$260b$1 digitaldaemon.com..."Kris" <someidiot earthlink.dot.dot.dot.net> wrote in message news:c9c1l3$1tfp$1 digitaldaemon.com...ofHow does one allocate an array on the stack without incurring the wratha"clear everything to zero" loop? I've tried the obvious, such as char[Size] array; and char[] array = (cast(char *) alloca(Size))[0..Size]; both of which insist on clearing the content to zero. I appreciate deterministic state as much as the next person, but there are times whenitgets in the way of execution speed ...char[] array; array = (cast(char *) alloca(Size))[0..Size];
May 30 2004
"Kris" <someidiot earthlink.dot.dot.dot.net> wrote in message news:c9d2n6$7hp$1 digitaldaemon.com...alloca() still does a < rep movs dword ptr [edi],dword ptr [esi] >, albeitashorter loop than < char[Size] array > emits.That loop doesn't initialize the array, it only adjusts the stack frame to make room. The array itself is uninitialized.Surely there must be a simple means of circumventing all this overhead? I mean, I'm allocating on the stack so as to avoid heap management: yet still have all this to contend with. Whatever happened to <sub sp, Size> type stack allocation? There's still a real need for that ...Uninitialized variables are a huge source of hidden bugs in C/C++. Getting rid of that is a design goal of D.
May 30 2004
I have absolutely no argument with your point regarding uninitialized variables, but there are occasions where a "practical programmer needs a practical language". That adjustment of the stack frame is a 44 byte *copy* loop. With today's 12+ multiplier CPUs, that's a noticeable amount of time spent waiting for memory access ... What I'm doing here is making a scratchpad available for lower-level functions to format content into. This has to be done on a thread-by-thread basis, so the stack is a safe location to house said scratchpad. The alternative is for me to provide a 'freelist' of preallocated buffers ... in the latter case, the freelist would likely end up (over time) with as many instances as there are threads running. If the number of threads is quite large, and the buffer is of a reasonable size, you can end up eating gobs of memory instead of simply placing the buffer on the stack in the first place. To get around this, you end up creating a "reaper" for the freelist. This is not a good solution. To make matters worse, freelists must be synchronized which adds more overhead. To add insult to injury, a synchronized method often includes the hidden equivalent of a try/catch to ensure the monitor is released. All this, just because I wanted to quickly allocate a scratchpad on the stack? Good grief :-) This issue has arisen simply because I want to ensure the overhead for runtime logging/tracing is minimal. Surely this is a valid and practical goal? To illustrate, here's a simplified example: void test () { // provide a scratchpad on the stack char[40] output; // do something with the returned slice uitoa (output, 17); } static char[] uitoa (char[] s, uint i) { uint len = s.length; while (i && len) { s[--len] = i % 10 + '0'; i /= 10; } return s[len..s.length]; } That is, rather than uitoa() allocating from the heap (a la Phobos) it populates and returns a slice on the provided buffer. You must realize that the initialization of the caller's stack array is actually likely to cost more (in time) than the uitoa() call itself for small numbers ... There has to be a better solution, and I'd much appreciate your suggestions. - Kris "Walter" <newshound digitalmars.com> wrote in message news:c9d510$akc$2 digitaldaemon.com..."Kris" <someidiot earthlink.dot.dot.dot.net> wrote in message news:c9d2n6$7hp$1 digitaldaemon.com...albeitalloca() still does a < rep movs dword ptr [edi],dword ptr [esi] >,astillshorter loop than < char[Size] array > emits.That loop doesn't initialize the array, it only adjusts the stack frame to make room. The array itself is uninitialized.Surely there must be a simple means of circumventing all this overhead? I mean, I'm allocating on the stack so as to avoid heap management: yethave all this to contend with. Whatever happened to <sub sp, Size> type stack allocation? There's still a real need for that ...Uninitialized variables are a huge source of hidden bugs in C/C++. Getting rid of that is a design goal of D.
May 30 2004
In article <c9d8n1$fq6$1 digitaldaemon.com>, Kris says...I have absolutely no argument with your point regarding uninitialized variables, but there are occasions where a "practical programmer needs a practical language".inline assembler? Jill
May 30 2004
Yeah, I didn't dare mention that one AJ 'cos it seems ridiculous to go there just for a fast stack array :-) I'm not saying that D should follow the C/C++ mentality here (for fast stack arrays), but surely this is a valid issue? - Kris "Arcane Jill" <Arcane_member pathlink.com> wrote in message news:c9da9d$hp3$1 digitaldaemon.com...In article <c9d8n1$fq6$1 digitaldaemon.com>, Kris says...I have absolutely no argument with your point regarding uninitialized variables, but there are occasions where a "practical programmer needs a practical language".inline assembler? Jill
May 30 2004
"Kris" <someidiot earthlink.dot.dot.dot.net> wrote in message news:c9d8n1$fq6$1 digitaldaemon.com...I have absolutely no argument with your point regarding uninitialized variables, but there are occasions where a "practical programmer needs a practical language". That adjustment of the stack frame is a 44 byte *copy* loop. With today's 12+ multiplier CPUs, that's a noticeable amount of time spent waiting for memory access ...Copying a few bytes that are on the stack are in the cache already.What I'm doing here is making a scratchpad available for lower-level functions to format content into. This has to be done on athread-by-threadbasis, so the stack is a safe location to house said scratchpad. The alternative is for me to provide a 'freelist' of preallocated buffers ...inthe latter case, the freelist would likely end up (over time) with as many instances as there are threads running. If the number of threads is quite large, and the buffer is of a reasonable size, you can end up eating gobs of memory instead of simply placing the buffer on the stack in the first place. To get around this, you end up creating a "reaper" for thefreelist.This is not a good solution. To make matters worse, freelists must be synchronized which adds more overhead. To add insult to injury, a synchronized method often includes the hidden equivalent of a try/catch to ensure the monitor is released. All this, just because I wanted to quickly allocate a scratchpad on the stack? Good grief :-) This issue has arisen simply because I want to ensure the overhead for runtime logging/tracing is minimal. Surely this is a valid and practical goal? To illustrate, here's a simplified example: void test () { // provide a scratchpad on the stack char[40] output; // do something with the returned slice uitoa (output, 17); } static char[] uitoa (char[] s, uint i) { uint len = s.length; while (i && len) { s[--len] = i % 10 + '0'; i /= 10; } return s[len..s.length]; } That is, rather than uitoa() allocating from the heap (a la Phobos) it populates and returns a slice on the provided buffer. You must realizethatthe initialization of the caller's stack array is actually likely to cost more (in time) than the uitoa() call itself for small numbers ... There has to be a better solution, and I'd much appreciate yoursuggestions. In most cases, a good optimizer will be able to eliminate the redundant initialization. But not all cases. If the array is just a few bytes long, I doubt you'll be able to detect it in a real app. For large arrays, alloca() is the way to go. I think the price of a bit of an occaisional over-initialization is worth paying in exchange for more reliable programs. And I speak as one who has a mania for detailed optimization and have a reputation for using dirty tricks to get it.
May 30 2004
"Walter" wroteCopying a few bytes that are on the stack are in the cache already.If so, then fair point ... (though it would hurt a typical MCU).I think the price of a bit of an occaisional over-initialization is worth paying in exchange for more reliable programs. And I speak as one who hasamania for detailed optimization and have a reputation for using dirtytricksto get it.I was going to pull you on the latter, but thought it might be out-of-order <g>. However, let's not confuse the current issue with a dirty trick; it's a perfectly reasonable and legitimate thing to do. And it's not even close to manic optimization. If you disagree with those two points, then please say so :-) Let me ask a question (or two), and then I'll drop it :: suppose there were some reasonable assignment syntax that permitted the programmer to omit default array initialization under carefully considered circumstances, would you still claim that alloca() is the right way to go, even with its overhead? For all purposes? If nothing else, the exercise uncovered two bugs (in the buglist) ... - Kris
May 30 2004
"Kris" wrote inIf nothing else, the exercise uncovered two bugs (in the buglist) ...Errrr ... perhaps one bug (should have checked first).
May 30 2004
I think the price of a bit of an occaisional over-initialization is worth paying in exchange for more reliable programs. And I speak as one who has a mania for detailed optimization and have a reputation for using dirty tricks to get it.Some of them are positively filthy! (More power to your elbow.)
Jun 02 2004
Walter wrote:I think the price of a bit of an occaisional over-initialization is worth paying in exchange for more reliable programs. And I speak as one who has a mania for detailed optimization and have a reputation for using dirty tricks to get it.Implicit initialization does not make programs more reliable at all. If the default value is what you want for the variable, implicit initialization saves a bit of typing. If the variable needs to be initialized to some other value but you forget to do this, the default value is just as incorrect as any random value that might have been in that memory location by chance. Implicit initialization makes bugs behave more predictable and portable, but they stay bugs. Furthermore, they will even be harder to localize, because you never know whether the programmer really intended the variable to be initialized with the default value, or whether he just forgot some explicit initialization. C/C++ may warn if an initialization was forgotten. D will just accept the code, since there is no way to recognize that it has wrong semantics. If you want to prevent bugs, go for *explicit initialization* demanded by the language. Just call it an *error* if the compiler cannot assert that a variable was explicitely initialized before the first use. (In C/C++, this is just a warning which might be ignored. To turn it into an error, the specs might have to be slightly restricted compared to C/C++, no idea whether this is possible.)
Jun 02 2004
"Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message news:c9kmvm$1v8g$1 digitaldaemon.com...Walter wrote:And another long, and eventually ignored, thread begins. I wish you the best of luck. :-) Derek the DoomedI think the price of a bit of an occaisional over-initialization is worth paying in exchange for more reliable programs. And I speak as one who has a mania for detailed optimization and have a reputation for using dirty tricks to get it.Implicit initialization does not make programs more reliable at all. If the default value is what you want for the variable, implicit initialization saves a bit of typing. If the variable needs to be initialized to some other value but you forget to do this, the default value is just as incorrect as any random value that might have been in that memory location by chance. Implicit initialization makes bugs behave more predictable and portable, but they stay bugs. Furthermore, they will even be harder to localize, because you never know whether the programmer really intended the variable to be initialized with the default value, or whether he just forgot some explicit initialization. C/C++ may warn if an initialization was forgotten. D will just accept the code, since there is no way to recognize that it has wrong semantics. If you want to prevent bugs, go for *explicit initialization* demanded by the language. Just call it an *error* if the compiler cannot assert that a variable was explicitely initialized before the first use. (In C/C++, this is just a warning which might be ignored. To turn it into an error, the specs might have to be slightly restricted compared to C/C++, no idea whether this is possible.)
Jun 04 2004
Kris wrote:I have absolutely no argument with your point regarding uninitialized variables, but there are occasions where a "practical programmer needs a practical language". That adjustment of the stack frame is a 44 byte *copy* loop. With today's 12+ multiplier CPUs, that's a noticeable amount of time spent waiting for memory access ... What I'm doing here is making a scratchpad available for lower-level functions to format content into. This has to be done on a thread-by-thread basis, so the stack is a safe location to house said scratchpad. The alternative is for me to provide a 'freelist' of preallocated buffers ... in the latter case, the freelist would likely end up (over time) with as many instances as there are threads running. If the number of threads is quite large, and the buffer is of a reasonable size, you can end up eating gobs of memory instead of simply placing the buffer on the stack in the first place. To get around this, you end up creating a "reaper" for the freelist. This is not a good solution. To make matters worse, freelists must be synchronized which adds more overhead. To add insult to injury, a synchronized method often includes the hidden equivalent of a try/catch to ensure the monitor is released. All this, just because I wanted to quickly allocate a scratchpad on the stack? Good grief :-) This issue has arisen simply because I want to ensure the overhead for runtime logging/tracing is minimal. Surely this is a valid and practical goal? To illustrate, here's a simplified example: void test () { // provide a scratchpad on the stack char[40] output; // do something with the returned slice uitoa (output, 17); } static char[] uitoa (char[] s, uint i) { uint len = s.length; while (i && len) { s[--len] = i % 10 + '0'; i /= 10; } return s[len..s.length]; } That is, rather than uitoa() allocating from the heap (a la Phobos) it populates and returns a slice on the provided buffer. You must realize that the initialization of the caller's stack array is actually likely to cost more (in time) than the uitoa() call itself for small numbers ... There has to be a better solution, and I'd much appreciate your suggestions. - KrisIf performace is an issue, generally I try to pre-allocate the array. Obviously there are circumstances where that doesn't work but generally it's sufficient."Walter" <newshound digitalmars.com> wrote in message news:c9d510$akc$2 digitaldaemon.com...-- -Anderson: http://badmama.com.au/~anderson/"Kris" <someidiot earthlink.dot.dot.dot.net> wrote in message news:c9d2n6$7hp$1 digitaldaemon.com...albeitalloca() still does a < rep movs dword ptr [edi],dword ptr [esi] >,astillshorter loop than < char[Size] array > emits.That loop doesn't initialize the array, it only adjusts the stack frame to make room. The array itself is uninitialized.Surely there must be a simple means of circumventing all this overhead? I mean, I'm allocating on the stack so as to avoid heap management: yethave all this to contend with. Whatever happened to <sub sp, Size> type stack allocation? There's still a real need for that ...Uninitialized variables are a huge source of hidden bugs in C/C++. Getting rid of that is a design goal of D.
May 30 2004
"J Anderson" wroteIf performace is an issue, generally I try to pre-allocate the array. Obviously there are circumstances where that doesn't work but generally it's sufficient.Me too, and seriously so. For example, mango.server and mango.servlet happily process each HTTP request without making a single heap allocation. That includes all the header/params/url parsing & decoding etc, and is why those two make arguably the fastest HTTP/servlet engine in existance. However, in this particular case there should be absolutely no need to go to the trouble of freelists and/or thread-local-storage (like mango.server et. al. do). Those options incurr a reasonable amount of development and runtine overhead for what really should be a no-brainer operation. Other than that, I fully agree with your sentiment. - Kris
May 31 2004
Walter wrote:Uninitialized variables are a huge source of hidden bugs in C/C++. Getting rid of that is a design goal of D.If you forget to initialize a variable that should be initialized, that will be a bug in D just the same way that it would be one in C/C++. The only difference is, that the behavior of the buggy program is deterministic. Even worse: In C/C++, a good compiler can give a warning if a variable may be read uninitialized. In D, this is not possible, because a variable is always set to some default value, so it is legal to read it. Default values in D are a great feature of practical use, but as a means to prevent bugs, they will not help at all. If the default value is what you want, then implicit initialization might help. For pointers/references, initializing to zero by default is reasonable. But forcing every type to be initialized to something does not help at all but even is counterproductive (preventing the compiler from throwing reasonable warnings.)
Jun 01 2004