digitalmars.D - Q: fast stack allocation?

Kris (9/9) May 30 2004 How does one allocate an array on the stack without incurring the wrath ...

Walter (6/14) May 30 2004 a

Kris (10/28) May 30 2004 alloca() still does a < rep movs dword ptr [edi],dword ptr [esi] >, albe...

Walter (7/13) May 30 2004 a

Kris (51/64) May 30 2004 I have absolutely no argument with your point regarding uninitialized

Arcane Jill (3/6) May 30 2004 inline assembler?

Kris (7/14) May 30 2004 Yeah, I didn't dare mention that one AJ 'cos it seems ridiculous to go t...

Walter (16/62) May 30 2004 Copying a few bytes that are on the stack are in the cache already.

Kris (16/21) May 30 2004 a

Kris (2/3) May 30 2004 Errrr ... perhaps one bug (should have checked first).

Matthew (1/5) Jun 02 2004 Some of them are positively filthy! (More power to your elbow.)
Norbert Nemec (20/24) Jun 02 2004 Implicit initialization does not make programs more reliable at all. If ...

Matthew (6/30) Jun 04 2004 And another long, and eventually ignored, thread begins. I wish you the ...

J Anderson (6/94) May 30 2004 If performace is an issue, generally I try to pre-allocate the array.

Kris (11/14) May 31 2004 Me too, and seriously so. For example, mango.server and mango.servlet

Norbert Nemec (13/15) Jun 01 2004 If you forget to initialize a variable that should be initialized, that ...

"Kris" <someidiot earthlink.dot.dot.dot.net> writes:

How does one allocate an array on the stack without incurring the wrath of a
"clear everything to zero" loop? I've tried the obvious, such as

char[Size] array;

and

char[] array = (cast(char *) alloca(Size))[0..Size];

both of which insist on clearing the content to zero. I appreciate
deterministic state as much as the next person, but there are times when it
gets in the way of execution speed ...

- Kris

May 30 2004

"Walter" <newshound digitalmars.com> writes:

"Kris" <someidiot earthlink.dot.dot.dot.net> wrote in message
news:c9c1l3$1tfp$1 digitaldaemon.com...
 How does one allocate an array on the stack without incurring the wrath of

a
 "clear everything to zero" loop? I've tried the obvious, such as

 char[Size] array;

 and

 char[] array = (cast(char *) alloca(Size))[0..Size];

 both of which insist on clearing the content to zero. I appreciate
 deterministic state as much as the next person, but there are times when

it
 gets in the way of execution speed ...

char[] array;
array = (cast(char *) alloca(Size))[0..Size];

May 30 2004

"Kris" <someidiot earthlink.dot.dot.dot.net> writes:

alloca() still does a < rep movs dword ptr [edi],dword ptr [esi] >, albeit a
shorter loop than < char[Size] array > emits.

Surely there must be a simple means of circumventing all this overhead? I
mean, I'm allocating on the stack so as to avoid heap management: yet still
have all this to contend with. Whatever happened to <sub sp, Size> type
stack allocation? There's still a real need for that ...

- Kris


"Walter" <newshound digitalmars.com> wrote in message
news:c9c7p0$260b$1 digitaldaemon.com...
 "Kris" <someidiot earthlink.dot.dot.dot.net> wrote in message
 news:c9c1l3$1tfp$1 digitaldaemon.com...
 How does one allocate an array on the stack without incurring the wrath


of
 a
 "clear everything to zero" loop? I've tried the obvious, such as

 char[Size] array;

 and

 char[] array = (cast(char *) alloca(Size))[0..Size];

 both of which insist on clearing the content to zero. I appreciate
 deterministic state as much as the next person, but there are times when

 it
 gets in the way of execution speed ...

 char[] array;
 array = (cast(char *) alloca(Size))[0..Size];

May 30 2004

"Walter" <newshound digitalmars.com> writes:

"Kris" <someidiot earthlink.dot.dot.dot.net> wrote in message
news:c9d2n6$7hp$1 digitaldaemon.com...
 alloca() still does a < rep movs dword ptr [edi],dword ptr [esi] >, albeit

a
 shorter loop than < char[Size] array > emits.

That loop doesn't initialize the array, it only adjusts the stack frame to
make room. The array itself is uninitialized.

Surely there must be a simple means of circumventing all this overhead? I
mean, I'm allocating on the stack so as to avoid heap management: yet still
have all this to contend with. Whatever happened to <sub sp, Size> type
stack allocation? There's still a real need for that ...

Uninitialized variables are a huge source of hidden bugs in C/C++. Getting
rid of that is a design goal of D.

May 30 2004

"Kris" <someidiot earthlink.dot.dot.dot.net> writes:

I have absolutely no argument with your point regarding uninitialized
variables, but there are occasions where a "practical programmer needs a
practical language".

That adjustment of the stack frame is a 44 byte *copy* loop. With today's
12+ multiplier CPUs, that's a noticeable amount of time spent waiting for
memory access ...

What I'm doing here is making a scratchpad available for lower-level
functions to format content into. This has to be done on a thread-by-thread
basis, so the stack is a safe location to house said scratchpad. The
alternative is for me to provide a 'freelist' of preallocated buffers ... in
the latter case, the freelist would likely end up (over time) with as many
instances as there are threads running. If the number of threads is quite
large, and the buffer is of a reasonable size,  you can end up eating gobs
of memory instead of simply placing the buffer on the stack in the first
place. To get around this, you end up creating a "reaper" for the freelist.

This is not a good solution. To make matters worse, freelists must be
synchronized which adds more overhead. To add insult to injury, a
synchronized method often includes the hidden equivalent of a try/catch to
ensure the monitor is released. All this, just because I wanted to quickly
allocate a scratchpad on the stack? Good grief :-)

This issue has arisen simply because I want to ensure the overhead for
runtime logging/tracing is minimal. Surely this is a valid and practical
goal?

To illustrate, here's a simplified example:

        void test ()
        {
                // provide a scratchpad on the stack
                char[40] output;

                // do something with the returned slice
                uitoa (output, 17);
        }

        static char[] uitoa (char[] s, uint i)
        {
                uint len = s.length;
                while (i && len)
                      {
                      s[--len] = i % 10 + '0';
                      i /= 10;
                      }
                return s[len..s.length];
        }

That is, rather than uitoa() allocating from the heap (a la Phobos) it
populates and returns a slice on the provided buffer. You must realize that
the initialization of the caller's stack array is actually likely to cost
more (in time) than the uitoa() call itself for small numbers ...

There has to be a better solution, and I'd much appreciate your suggestions.

- Kris


"Walter" <newshound digitalmars.com> wrote in message
news:c9d510$akc$2 digitaldaemon.com...
 "Kris" <someidiot earthlink.dot.dot.dot.net> wrote in message
 news:c9d2n6$7hp$1 digitaldaemon.com...
 alloca() still does a < rep movs dword ptr [edi],dword ptr [esi] >,


albeit
 a
 shorter loop than < char[Size] array > emits.

 That loop doesn't initialize the array, it only adjusts the stack frame to
 make room. The array itself is uninitialized.

Surely there must be a simple means of circumventing all this overhead? I
mean, I'm allocating on the stack so as to avoid heap management: yet


still
have all this to contend with. Whatever happened to <sub sp, Size> type
stack allocation? There's still a real need for that ...

 Uninitialized variables are a huge source of hidden bugs in C/C++. Getting
 rid of that is a design goal of D.

May 30 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <c9d8n1$fq6$1 digitaldaemon.com>, Kris says...
I have absolutely no argument with your point regarding uninitialized
variables, but there are occasions where a "practical programmer needs a
practical language".

inline assembler?

Jill

May 30 2004

"Kris" <someidiot earthlink.dot.dot.dot.net> writes:

Yeah, I didn't dare mention that one AJ 'cos it seems ridiculous to go there
just for a fast stack array :-)

I'm not saying that D should follow the C/C++ mentality here (for fast stack
arrays), but surely this is a valid issue?

- Kris

"Arcane Jill" <Arcane_member pathlink.com> wrote in message
news:c9da9d$hp3$1 digitaldaemon.com...
 In article <c9d8n1$fq6$1 digitaldaemon.com>, Kris says...
I have absolutely no argument with your point regarding uninitialized
variables, but there are occasions where a "practical programmer needs a
practical language".

 inline assembler?

 Jill

May 30 2004

"Walter" <newshound digitalmars.com> writes:

"Kris" <someidiot earthlink.dot.dot.dot.net> wrote in message
news:c9d8n1$fq6$1 digitaldaemon.com...
 I have absolutely no argument with your point regarding uninitialized
 variables, but there are occasions where a "practical programmer needs a
 practical language".

 That adjustment of the stack frame is a 44 byte *copy* loop. With today's
 12+ multiplier CPUs, that's a noticeable amount of time spent waiting for
 memory access ...

Copying a few bytes that are on the stack are in the cache already.

 What I'm doing here is making a scratchpad available for lower-level
 functions to format content into. This has to be done on a

thread-by-thread
 basis, so the stack is a safe location to house said scratchpad. The
 alternative is for me to provide a 'freelist' of preallocated buffers ...

in
 the latter case, the freelist would likely end up (over time) with as many
 instances as there are threads running. If the number of threads is quite
 large, and the buffer is of a reasonable size,  you can end up eating gobs
 of memory instead of simply placing the buffer on the stack in the first
 place. To get around this, you end up creating a "reaper" for the

freelist.
 This is not a good solution. To make matters worse, freelists must be
 synchronized which adds more overhead. To add insult to injury, a
 synchronized method often includes the hidden equivalent of a try/catch to
 ensure the monitor is released. All this, just because I wanted to quickly
 allocate a scratchpad on the stack? Good grief :-)

 This issue has arisen simply because I want to ensure the overhead for
 runtime logging/tracing is minimal. Surely this is a valid and practical
 goal?

 To illustrate, here's a simplified example:

         void test ()
         {
                 // provide a scratchpad on the stack
                 char[40] output;

                 // do something with the returned slice
                 uitoa (output, 17);
         }

         static char[] uitoa (char[] s, uint i)
         {
                 uint len = s.length;
                 while (i && len)
                       {
                       s[--len] = i % 10 + '0';
                       i /= 10;
                       }
                 return s[len..s.length];
         }

 That is, rather than uitoa() allocating from the heap (a la Phobos) it
 populates and returns a slice on the provided buffer. You must realize

that
 the initialization of the caller's stack array is actually likely to cost
 more (in time) than the uitoa() call itself for small numbers ...

 There has to be a better solution, and I'd much appreciate your

suggestions.

In most cases, a good optimizer will be able to eliminate the redundant
initialization. But not all cases. If the array is just a few bytes long, I
doubt you'll be able to detect it in a real app. For large arrays, alloca()
is the way to go.

I think the price of a bit of an occaisional over-initialization is worth
paying in exchange for more reliable programs. And I speak as one who has a
mania for detailed optimization and have a reputation for using dirty tricks
to get it.

May 30 2004

"Kris" <someidiot earthlink.dot.dot.dot.net> writes:

"Walter" wrote
 Copying a few bytes that are on the stack are in the cache already.

If so, then fair point ... (though it would hurt a typical MCU).

 I think the price of a bit of an occaisional over-initialization is worth
 paying in exchange for more reliable programs. And I speak as one who has

a
 mania for detailed optimization and have a reputation for using dirty

tricks
 to get it.

I was going to pull you on the latter, but thought it might be out-of-order
<g>. However, let's not confuse the current issue with a dirty trick; it's a
perfectly reasonable and legitimate thing to do. And it's not even close to
manic optimization. If you disagree with those two points, then please say
so :-)

Let me ask a question (or two), and then I'll drop it :: suppose there were
some reasonable assignment syntax that permitted the programmer to omit
default array initialization under carefully considered circumstances, would
you still claim that alloca() is the right way to go, even with its
overhead? For all purposes?

If nothing else, the exercise uncovered two bugs (in the buglist) ...

- Kris

May 30 2004

"Kris" <someidiot earthlink.dot.dot.dot.net> writes:

"Kris"  wrote in
 If nothing else, the exercise uncovered two bugs (in the buglist) ...

Errrr ... perhaps one bug (should have checked first).

May 30 2004

"Matthew" <matthew.hat stlsoft.dot.org> writes:

 I think the price of a bit of an occaisional over-initialization is worth
 paying in exchange for more reliable programs. And I speak as one who has a
 mania for detailed optimization and have a reputation for using dirty tricks
 to get it.

Some of them are positively filthy! (More power to your elbow.)

Jun 02 2004

Norbert Nemec <Norbert.Nemec gmx.de> writes:

Walter wrote:

 I think the price of a bit of an occaisional over-initialization is worth
 paying in exchange for more reliable programs. And I speak as one who has
 a mania for detailed optimization and have a reputation for using dirty
 tricks to get it.

Implicit initialization does not make programs more reliable at all. If the
default value is what you want for the variable, implicit initialization
saves a bit of typing. If the variable needs to be initialized to some
other value but you forget to do this, the default value is just as
incorrect as any random value that might have been in that memory location
by chance.

Implicit initialization makes bugs behave more predictable and portable, but
they stay bugs. Furthermore, they will even be harder to localize, because
you never know whether the programmer really intended the variable to be
initialized with the default value, or whether he just forgot some explicit
initialization. C/C++ may warn if an initialization was forgotten. D will
just accept the code, since there is no way to recognize that it has wrong
semantics.

If you want to prevent bugs, go for *explicit initialization* demanded by
the language. Just call it an *error* if the compiler cannot assert that a
variable was explicitely initialized before the first use. (In C/C++, this
is just a warning which might be ignored. To turn it into an error, the
specs might have to be slightly restricted compared to C/C++, no idea
whether this is possible.)

Jun 02 2004

"Matthew" <matthew.hat stlsoft.dot.org> writes:

"Norbert Nemec" <Norbert.Nemec gmx.de> wrote in message
news:c9kmvm$1v8g$1 digitaldaemon.com...
 Walter wrote:

 I think the price of a bit of an occaisional over-initialization is worth
 paying in exchange for more reliable programs. And I speak as one who has
 a mania for detailed optimization and have a reputation for using dirty
 tricks to get it.

 Implicit initialization does not make programs more reliable at all. If the
 default value is what you want for the variable, implicit initialization
 saves a bit of typing. If the variable needs to be initialized to some
 other value but you forget to do this, the default value is just as
 incorrect as any random value that might have been in that memory location
 by chance.

 Implicit initialization makes bugs behave more predictable and portable, but
 they stay bugs. Furthermore, they will even be harder to localize, because
 you never know whether the programmer really intended the variable to be
 initialized with the default value, or whether he just forgot some explicit
 initialization. C/C++ may warn if an initialization was forgotten. D will
 just accept the code, since there is no way to recognize that it has wrong
 semantics.

 If you want to prevent bugs, go for *explicit initialization* demanded by
 the language. Just call it an *error* if the compiler cannot assert that a
 variable was explicitely initialized before the first use. (In C/C++, this
 is just a warning which might be ignored. To turn it into an error, the
 specs might have to be slightly restricted compared to C/C++, no idea
 whether this is possible.)

And another long, and eventually ignored, thread begins. I wish you the best of
luck.

:-)

Derek the Doomed

Jun 04 2004

J Anderson <REMOVEanderson badmama.com.au> writes:

Kris wrote:

I have absolutely no argument with your point regarding uninitialized
variables, but there are occasions where a "practical programmer needs a
practical language".

That adjustment of the stack frame is a 44 byte *copy* loop. With today's
12+ multiplier CPUs, that's a noticeable amount of time spent waiting for
memory access ...

What I'm doing here is making a scratchpad available for lower-level
functions to format content into. This has to be done on a thread-by-thread
basis, so the stack is a safe location to house said scratchpad. The
alternative is for me to provide a 'freelist' of preallocated buffers ... in
the latter case, the freelist would likely end up (over time) with as many
instances as there are threads running. If the number of threads is quite
large, and the buffer is of a reasonable size,  you can end up eating gobs
of memory instead of simply placing the buffer on the stack in the first
place. To get around this, you end up creating a "reaper" for the freelist.

This is not a good solution. To make matters worse, freelists must be
synchronized which adds more overhead. To add insult to injury, a
synchronized method often includes the hidden equivalent of a try/catch to
ensure the monitor is released. All this, just because I wanted to quickly
allocate a scratchpad on the stack? Good grief :-)

This issue has arisen simply because I want to ensure the overhead for
runtime logging/tracing is minimal. Surely this is a valid and practical
goal?

To illustrate, here's a simplified example:

        void test ()
        {
                // provide a scratchpad on the stack
                char[40] output;

                // do something with the returned slice
                uitoa (output, 17);
        }

        static char[] uitoa (char[] s, uint i)
        {
                uint len = s.length;
                while (i && len)
                      {
                      s[--len] = i % 10 + '0';
                      i /= 10;
                      }
                return s[len..s.length];
        }

That is, rather than uitoa() allocating from the heap (a la Phobos) it
populates and returns a slice on the provided buffer. You must realize that
the initialization of the caller's stack array is actually likely to cost
more (in time) than the uitoa() call itself for small numbers ...

There has to be a better solution, and I'd much appreciate your suggestions.

- Kris
  

If performace is an issue, generally I try to pre-allocate the array.  
Obviously there are circumstances where that doesn't work but generally 
it's sufficient.

"Walter" <newshound digitalmars.com> wrote in message
news:c9d510$akc$2 digitaldaemon.com...
  

"Kris" <someidiot earthlink.dot.dot.dot.net> wrote in message
news:c9d2n6$7hp$1 digitaldaemon.com...
    

alloca() still does a < rep movs dword ptr [edi],dword ptr [esi] >,
      


albeit
  

a
    

shorter loop than < char[Size] array > emits.
      

That loop doesn't initialize the array, it only adjusts the stack frame to
make room. The array itself is uninitialized.

    

Surely there must be a simple means of circumventing all this overhead? I
mean, I'm allocating on the stack so as to avoid heap management: yet
      


still
  

have all this to contend with. Whatever happened to <sub sp, Size> type
stack allocation? There's still a real need for that ...
      

Uninitialized variables are a huge source of hidden bugs in C/C++. Getting
rid of that is a design goal of D.


    


  


-- 
-Anderson: http://badmama.com.au/~anderson/

May 30 2004

"Kris" <someidiot earthlink.dot.dot.dot.net> writes:

"J Anderson" wrote
 If performace is an issue, generally I try to pre-allocate the array.
 Obviously there are circumstances where that doesn't work but generally
 it's sufficient.

Me too, and seriously so. For example, mango.server and mango.servlet
happily process each HTTP request without making a single heap allocation.
That includes all the header/params/url parsing & decoding etc, and is why
those two make arguably the fastest HTTP/servlet engine in existance.

However, in this particular case there should be absolutely no need to go to
the trouble of freelists and/or thread-local-storage (like mango.server et.
al. do). Those options incurr a reasonable amount of development and runtine
overhead for what really should be a no-brainer operation.

Other than that, I fully agree with your sentiment.

- Kris

May 31 2004

Norbert Nemec <Norbert.Nemec gmx.de> writes:

Walter wrote:

 Uninitialized variables are a huge source of hidden bugs in C/C++. Getting
 rid of that is a design goal of D.

If you forget to initialize a variable that should be initialized, that will
be a bug in D just the same way that it would be one in C/C++. The only
difference is, that the behavior of the buggy program is deterministic.

Even worse: In C/C++, a good compiler can give a warning if a variable may
be read uninitialized. In D, this is not possible, because a variable is
always set to some default value, so it is legal to read it.

Default values in D are a great feature of practical use, but as a means to
prevent bugs, they will not help at all. If the default value is what you
want, then implicit initialization might help. For pointers/references,
initializing to zero by default is reasonable. But forcing every type to be
initialized to something does not help at all but even is counterproductive
(preventing the compiler from throwing reasonable warnings.)

Jun 01 2004

D Programming

C/C++ Programming

Other

digitalmars.D - Q: fast stack allocation?