digitalmars.D - Stack-allocated arrays
- dsimcha (26/26) Nov 11 2008 I've noticed that, when writing multithreaded number crunching code,
- Andrei Alexandrescu (6/38) Nov 11 2008 I, for one, am very interested and pleaded to Walter to introduce
- Denis Koroskin (9/53) Nov 11 2008 I would *LOVE* to see Variable-Length Arrays in D (they are implemented ...
- dsimcha (9/17) Nov 11 2008 Yes, a few people have said this in the past. I'm suggesting something ...
- Kagamin (1/1) Nov 11 2008 Since your code is designed to work in this way, you can implement alloc...
- Frits van Bommel (33/64) Nov 11 2008 I'd love for "scope foo = new T[len];" to do for arrays what "scope bar
- dsimcha (3/12) Nov 11 2008 If I'm going to do something this hackish and ugly, I may as well just u...
- Dave (17/98) Nov 11 2008 I'm surprised it doesn't and see that as a bit inconsistent, with the on...
- Janderson (24/48) Nov 11 2008 As a work around, I imagine it would be possible to write a template
- KennyTM~ (2/58) Nov 12 2008 But this won't work if size is runtime-determined.
- Janderson (3/62) Nov 12 2008 Thanks for clarifying my last point :)
- KennyTM~ (12/76) Nov 12 2008 Is it? I mean things like:
- Max Samukha (18/94) Nov 12 2008 Mixin brute force:
- Andrei Alexandrescu (29/34) Nov 12 2008 This entire discussion gets me thinking - could an alternate stack be a
- Sean Kelly (20/49) Nov 12 2008 It's an implementation detail, but when the owner thread dies it should
- Andrei Alexandrescu (21/73) Nov 12 2008 Speed. An allocation/deallocation is most of the time a counter bump and...
- Sean Kelly (13/61) Nov 12 2008 Or delete them if it's newed them :-) Assuming we want the GC to
- Don (13/40) Nov 12 2008 It wouldn't even need to return the memory to the OS. It could even
- Denis Koroskin (10/53) Nov 12 2008 Not only it is almost as fast as stack allocation, it is much safer (you...
- Sean Kelly (10/14) Nov 12 2008 It makes sense. Consider how futures work:
- Dave (9/44) Nov 12 2008 That would be interesting and even better if it could be a built-in (not
- Dave (4/61) Nov 12 2008 Oh, and any built-in syntax needs to provide a way to allocate uninitial...
- dsimcha (9/43) Nov 13 2008 That sounds terrific, although I think it would be even better if it wer...
- Tom S (9/11) Nov 14 2008 Yes, absolutely. I use lots of stack allocations and the vision of an
- Janderson (7/88) Nov 12 2008 I assumed you knew the size at compile time (like in Dave's example) and...
- Derek Parnell (9/15) Nov 11 2008 Would there be a problem if multiple stack-arrays were asked for. It mig...
- Kagamin (10/18) Nov 13 2008 void foo(in size_t s)
- Andrei Alexandrescu (3/23) Nov 13 2008 Sweet!!
I've noticed that, when writing multithreaded number crunching code, allocating memory for temporary storage can be a huge threading bottleneck. Of course, the obvious, but ugly solution is to pre-allocate and recycle temp variables. However, this breaks encapsulation at every place imaginable. An alternative that I've tried recently is to use alloca to stack allocate the temporary arrays if they are small enough. If the arrays being allocated are larger, I heap allocate, and this isn't a problem because in the large array case, the number crunching takes up enough time that heap allocations aren't much of a bottleneck anymore. However, this is also butt ugly because I have to use a bunch of pointers and casts and explicitly test the size for it to work. Given that this is highly useful, I think a nice feature for D would be to have some kind of a scheme to make this clean. For example: auto foo = newTemp[length]; If length is larger than some arbitrary but reasonable value, this is heap allocated. If length is small, it is stack allocated. Either way, allowing the array to escape the current scope is undefined behavior and the compiler would detect at least the simple cases of this. Ideally, since this would only be used for temp variables in performance-critical code, the contents of the array would also not be initialized. It would be nice to do this in a library, but there's no obvious way to do it without compiler support, since calling a function sets up a new stack frame. I'd like to just do it with an interface like: auto foo = newTemp!(T)(length). Anyone know any ASM frame pointer hacks to make something like this workable? Otherwise, is there enough interest in this that it might belong in the core language?
Nov 11 2008
dsimcha wrote:I've noticed that, when writing multithreaded number crunching code, allocating memory for temporary storage can be a huge threading bottleneck. Of course, the obvious, but ugly solution is to pre-allocate and recycle temp variables. However, this breaks encapsulation at every place imaginable. An alternative that I've tried recently is to use alloca to stack allocate the temporary arrays if they are small enough. If the arrays being allocated are larger, I heap allocate, and this isn't a problem because in the large array case, the number crunching takes up enough time that heap allocations aren't much of a bottleneck anymore. However, this is also butt ugly because I have to use a bunch of pointers and casts and explicitly test the size for it to work. Given that this is highly useful, I think a nice feature for D would be to have some kind of a scheme to make this clean. For example: auto foo = newTemp[length]; If length is larger than some arbitrary but reasonable value, this is heap allocated. If length is small, it is stack allocated. Either way, allowing the array to escape the current scope is undefined behavior and the compiler would detect at least the simple cases of this. Ideally, since this would only be used for temp variables in performance-critical code, the contents of the array would also not be initialized. It would be nice to do this in a library, but there's no obvious way to do it without compiler support, since calling a function sets up a new stack frame. I'd like to just do it with an interface like: auto foo = newTemp!(T)(length). Anyone know any ASM frame pointer hacks to make something like this workable? Otherwise, is there enough interest in this that it might belong in the core language?I, for one, am very interested and pleaded to Walter to introduce stack-allocation for simple arrays. I could point at an informal case study in which many students of mine discovered the related g++ language extension for C++ and used it to great effect. Andrei
Nov 11 2008
On Tue, 11 Nov 2008 20:39:07 +0300, dsimcha <dsimcha yahoo.com> wrote:I've noticed that, when writing multithreaded number crunching code, allocating memory for temporary storage can be a huge threading bottleneck. Of course, the obvious, but ugly solution is to pre-allocate and recycle temp variables. However, this breaks encapsulation at every place imaginable. An alternative that I've tried recently is to use alloca to stack allocate the temporary arrays if they are small enough. If the arrays being allocated are larger, I heap allocate, and this isn't a problem because in the large array case, the number crunching takes up enough time that heap allocations aren't much of a bottleneck anymore. However, this is also butt ugly because I have to use a bunch of pointers and casts and explicitly test the size for it to work. Given that this is highly useful, I think a nice feature for D would be to have some kind of a scheme to make this clean. For example: auto foo = newTemp[length]; If length is larger than some arbitrary but reasonable value, this is heap allocated. If length is small, it is stack allocated. Either way, allowing the array to escape the current scope is undefined behavior and the compiler would detect at least the simple cases of this. Ideally, since this would only be used for temp variables in performance-critical code, the contents of the array would also not be initialized. It would be nice to do this in a library, but there's no obvious way to do it without compiler support, since calling a function sets up a new stack frame. I'd like to just do it with an interface like: auto foo = newTemp!(T)(length). Anyone know any ASM frame pointer hacks to make something like this workable? Otherwise, is there enough interest in this that it might belong in the core language?I would *LOVE* to see Variable-Length Arrays in D (they are implemented in C99) void foo(int i) { int[i] myArray; // this is a static stack-allocated array (size can't be changed) assert(myArray.length() == i); }
Nov 11 2008
== Quote from Denis Koroskin (2korden gmail.com)'s articleI would *LOVE* to see Variable-Length Arrays in D (they are implemented in C99) void foo(int i) { int[i] myArray; // this is a static stack-allocated array (size can't be changed) assert(myArray.length() == i); }Yes, a few people have said this in the past. I'm suggesting something a little more than that. I'm saying that, to make this relatively safe, the compiler automatically "does the right thing" depending on how large an array you are allocating. If it's big enough to risk overflowing the stack, then it does heap allocation. Better yet, if the array is large, it does heap allocation *and* deletes the array for you at any exit points of the function so that the GC doesn't run as much, since the lifetime of the array is known at compile time.
Nov 11 2008
Since your code is designed to work in this way, you can implement allocation in a library. If you have one heap per thread, alloc is no more bottleneck. If you allocate only scope arrays, heap will have simplest stack design and will be extremely fast for this. Though with such design you should take caution when choosing in what heap to allocate and in what heap to free :3
Nov 11 2008
dsimcha wrote:I've noticed that, when writing multithreaded number crunching code, allocating memory for temporary storage can be a huge threading bottleneck. Of course, the obvious, but ugly solution is to pre-allocate and recycle temp variables. However, this breaks encapsulation at every place imaginable. An alternative that I've tried recently is to use alloca to stack allocate the temporary arrays if they are small enough. If the arrays being allocated are larger, I heap allocate, and this isn't a problem because in the large array case, the number crunching takes up enough time that heap allocations aren't much of a bottleneck anymore. However, this is also butt ugly because I have to use a bunch of pointers and casts and explicitly test the size for it to work. Given that this is highly useful, I think a nice feature for D would be to have some kind of a scheme to make this clean. For example: auto foo = newTemp[length]; If length is larger than some arbitrary but reasonable value, this is heap allocated. If length is small, it is stack allocated. Either way, allowing the array to escape the current scope is undefined behavior and the compiler would detect at least the simple cases of this. Ideally, since this wouldI'd love for "scope foo = new T[len];" to do for arrays what "scope bar = new Class;" does for classes. And indeed, if it's too big the compiler can always fall back on heap-allocation. That could be slightly tricky if you want to make sure the array is deleted in the heap-alloc case, but an initial implementation could always just let the GC handle it. ("T[len] foo;" for non-constant len would also be a nice syntax) A workaround you could use: ----- // N == your "arbitrary but reasonable value", use whatever you want. // (Add "= void" to next line to avoid initialization if you prefer) T[N] stackbuf; // don't use this except in next line T[] buf = stackbuf; // Set reference to stack buffer // Shrink to smaller size or heap-allocate bigger size: buf.length = length; ----- The disadvantage being, of course, that stackbuf will always be N items. If the majority of the cases is for the needed space to be much smaller an alloca-using solution may use much less stack space, especially in code where functions using this call each other.only be used for temp variables in performance-critical code, the contents of the array would also not be initialized.I disagree with not initializing it. By default it should be initialized. An explicit "don't initialize this, please" syntax would be nice, but the default should be safe.It would be nice to do this in a library, but there's no obvious way to do it without compiler support, since calling a function sets up a new stack frame. I'd like to just do it with an interface like: auto foo = newTemp!(T)(length). Anyone know any ASM frame pointer hacks to make something like this workable? Otherwise, is there enough interest in this that it might belong in the core language?Implementing alloca isn't actually all that hard in ASM; it just won't be in any way portable. The needed code depends on both the architecture (obviously) and the calling convention. Oh, and on the assumption that the compiler generates no code that assumes it knows the stack frame size (i.e. don't use -fomit-frame-pointer or equivalent, make sure your compiler uses a frame pointer[1]) [1]: And doesn't extend the frame and address the extended part using the frame pointer -- but most compilers don't do that anyway because it's less efficient, better to just pre-allocate the entire stackframe at the start.
Nov 11 2008
== Quote from Frits van Bommel (fvbommel REMwOVExCAPSs.nl)'s articleA workaround you could use: ----- // N == your "arbitrary but reasonable value", use whatever you want. // (Add "= void" to next line to avoid initialization if you prefer) T[N] stackbuf; // don't use this except in next line T[] buf = stackbuf; // Set reference to stack buffer // Shrink to smaller size or heap-allocate bigger size: buf.length = length; -----If I'm going to do something this hackish and ugly, I may as well just use alloca explicitly.
Nov 11 2008
"Frits van Bommel" <fvbommel REMwOVExCAPSs.nl> wrote in message news:gfcjb4$25un$1 digitalmars.com...dsimcha wrote:I'm surprised it doesn't and see that as a bit inconsistent, with the only serious argument against it being that 'scope' couldn't be used for large dynamic arrays. But then again: class C { int[264_000] a; } void foo() { scope C c = new C; ... } could also overflow the stack. In either case the work-around would be the same (increase the stack size or not use 'scope').I've noticed that, when writing multithreaded number crunching code, allocating memory for temporary storage can be a huge threading bottleneck. Of course, the obvious, but ugly solution is to pre-allocate and recycle temp variables. However, this breaks encapsulation at every place imaginable. An alternative that I've tried recently is to use alloca to stack allocate the temporary arrays if they are small enough. If the arrays being allocated are larger, I heap allocate, and this isn't a problem because in the large array case, the number crunching takes up enough time that heap allocations aren't much of a bottleneck anymore. However, this is also butt ugly because I have to use a bunch of pointers and casts and explicitly test the size for it to work. Given that this is highly useful, I think a nice feature for D would be to have some kind of a scheme to make this clean. For example: auto foo = newTemp[length]; If length is larger than some arbitrary but reasonable value, this is heap allocated. If length is small, it is stack allocated. Either way, allowing the array to escape the current scope is undefined behavior and the compiler would detect at least the simple cases of this. Ideally, since this wouldI'd love for "scope foo = new T[len];" to do for arrays what "scope bar = new Class;" does for classes. And indeed, if it's too big the compilercan always fall back on heap-allocation. That could be slightly tricky if you want to make sure the array is deleted in the heap-alloc case, but an initial implementation could always just let the GC handle it. ("T[len] foo;" for non-constant len would also be a nice syntax) A workaround you could use: ----- // N == your "arbitrary but reasonable value", use whatever you want. // (Add "= void" to next line to avoid initialization if you prefer) T[N] stackbuf; // don't use this except in next line T[] buf = stackbuf; // Set reference to stack buffer // Shrink to smaller size or heap-allocate bigger size: buf.length = length; ----- The disadvantage being, of course, that stackbuf will always be N items. If the majority of the cases is for the needed space to be much smaller an alloca-using solution may use much less stack space, especially in code where functions using this call each other.only be used for temp variables in performance-critical code, the contents of the array would also not be initialized.I disagree with not initializing it. By default it should be initialized. An explicit "don't initialize this, please" syntax would be nice, but the default should be safe.It would be nice to do this in a library, but there's no obvious way to do it without compiler support, since calling a function sets up a new stack frame. I'd like to just do it with an interface like: auto foo = newTemp!(T)(length). Anyone know any ASM frame pointer hacks to make something like this workable? Otherwise, is there enough interest in this that it might belong in the core language?Implementing alloca isn't actually all that hard in ASM; it just won't be in any way portable. The needed code depends on both the architecture (obviously) and the calling convention. Oh, and on the assumption that the compiler generates no code that assumes it knows the stack frame size (i.e. don't use -fomit-frame-pointer or equivalent, make sure your compiler uses a frame pointer[1]) [1]: And doesn't extend the frame and address the extended part using the frame pointer -- but most compilers don't do that anyway because it's less efficient, better to just pre-allocate the entire stackframe at the start.
Nov 11 2008
Dave wrote:As a work around, I imagine it would be possible to write a template that used the above syntax with a static if that would change depending on the size: Something like this (untested): class FastArray(T, int size) if (size < 1000) { T[size] a; ... Overload operators } class FastArray(T, int size) if (size >= 1000) { T a[] = new T[size]; ... Overload operators } //Use void foo() { scope FastArray array = new FastArray!(int, 10); //Stack scope FastArray array = new FastArray!(int, 10000); //Heap } Of course you never know where you are in the stack, so nesting these to much would be bad.I'd love for "scope foo = new T[len];" to do for arrays what "scope bar = new Class;" does for classes. And indeed, if it's too big the compilerI'm surprised it doesn't and see that as a bit inconsistent, with the only serious argument against it being that 'scope' couldn't be used for large dynamic arrays. But then again: class C { int[264_000] a; } void foo() { scope C c = new C; ... } could also overflow the stack. In either case the work-around would be the same (increase the stack size or not use 'scope').
Nov 11 2008
Janderson wrote:Dave wrote:But this won't work if size is runtime-determined.As a work around, I imagine it would be possible to write a template that used the above syntax with a static if that would change depending on the size: Something like this (untested): class FastArray(T, int size) if (size < 1000) { T[size] a; ... Overload operators } class FastArray(T, int size) if (size >= 1000) { T a[] = new T[size]; ... Overload operators } //Use void foo() { scope FastArray array = new FastArray!(int, 10); //Stack scope FastArray array = new FastArray!(int, 10000); //Heap } Of course you never know where you are in the stack, so nesting these to much would be bad.I'd love for "scope foo = new T[len];" to do for arrays what "scope bar = new Class;" does for classes. And indeed, if it's too big the compilerI'm surprised it doesn't and see that as a bit inconsistent, with the only serious argument against it being that 'scope' couldn't be used for large dynamic arrays. But then again: class C { int[264_000] a; } void foo() { scope C c = new C; ... } could also overflow the stack. In either case the work-around would be the same (increase the stack size or not use 'scope').
Nov 12 2008
KennyTM~ wrote:Janderson wrote:Thanks for clarifying my last point :) -JoelDave wrote:But this won't work if size is runtime-determined.As a work around, I imagine it would be possible to write a template that used the above syntax with a static if that would change depending on the size: Something like this (untested): class FastArray(T, int size) if (size < 1000) { T[size] a; ... Overload operators } class FastArray(T, int size) if (size >= 1000) { T a[] = new T[size]; ... Overload operators } //Use void foo() { scope FastArray array = new FastArray!(int, 10); //Stack scope FastArray array = new FastArray!(int, 10000); //Heap } Of course you never know where you are in the stack, so nesting these to much would be bad.I'd love for "scope foo = new T[len];" to do for arrays what "scope bar = new Class;" does for classes. And indeed, if it's too big the compilerI'm surprised it doesn't and see that as a bit inconsistent, with the only serious argument against it being that 'scope' couldn't be used for large dynamic arrays. But then again: class C { int[264_000] a; } void foo() { scope C c = new C; ... } could also overflow the stack. In either case the work-around would be the same (increase the stack size or not use 'scope').
Nov 12 2008
Janderson wrote:KennyTM~ wrote:Is it? I mean things like: int size = read_from_stdin(); scope array = new FixedSizeArray!(int)(size); You can't do it FixedSizeArray!(int, size) because all template arguments must be determined at compile time. The best solution I can think of, without compiler modification is a struct/class that contains a static array member T[1024] and a dynamic array member T[] initialized to null; and the code chooses which member to use in the constructor. But this always occupies 1024*T.sizeof bytes and there will always be a conditional (if) sticked to all access methods. I hope I don't misunderstand your last point :).Janderson wrote:Thanks for clarifying my last point :) -JoelDave wrote:But this won't work if size is runtime-determined.As a work around, I imagine it would be possible to write a template that used the above syntax with a static if that would change depending on the size: Something like this (untested): class FastArray(T, int size) if (size < 1000) { T[size] a; ... Overload operators } class FastArray(T, int size) if (size >= 1000) { T a[] = new T[size]; ... Overload operators } //Use void foo() { scope FastArray array = new FastArray!(int, 10); //Stack scope FastArray array = new FastArray!(int, 10000); //Heap } Of course you never know where you are in the stack, so nesting these to much would be bad.I'd love for "scope foo = new T[len];" to do for arrays what "scope bar = new Class;" does for classes. And indeed, if it's too big the compilerI'm surprised it doesn't and see that as a bit inconsistent, with the only serious argument against it being that 'scope' couldn't be used for large dynamic arrays. But then again: class C { int[264_000] a; } void foo() { scope C c = new C; ... } could also overflow the stack. In either case the work-around would be the same (increase the stack size or not use 'scope').
Nov 12 2008
On Thu, 13 Nov 2008 01:54:59 +0800, KennyTM~ <kennytm gmail.com> wrote:Janderson wrote:Mixin brute force: import std.c.stdlib; template FixedSizeArray(T, string name, alias size) { mixin ("scope " ~ name ~ " = size > 1000 ? new T[size] : (cast(T*)alloca(size * T.sizeof))[0..size];"); } void main() { size_t i = 100; mixin FixedSizeArray!(int, "arr", i); arr[] = 123; i = 10000; mixin FixedSizeArray!(int, "arr2", i); arr2[] = 123; }KennyTM~ wrote:Is it? I mean things like: int size = read_from_stdin(); scope array = new FixedSizeArray!(int)(size); You can't do it FixedSizeArray!(int, size) because all template arguments must be determined at compile time. The best solution I can think of, without compiler modification is a struct/class that contains a static array member T[1024] and a dynamic array member T[] initialized to null; and the code chooses which member to use in the constructor. But this always occupies 1024*T.sizeof bytes and there will always be a conditional (if) sticked to all access methods. I hope I don't misunderstand your last point :).Janderson wrote:Thanks for clarifying my last point :) -JoelDave wrote:But this won't work if size is runtime-determined.As a work around, I imagine it would be possible to write a template that used the above syntax with a static if that would change depending on the size: Something like this (untested): class FastArray(T, int size) if (size < 1000) { T[size] a; ... Overload operators } class FastArray(T, int size) if (size >= 1000) { T a[] = new T[size]; ... Overload operators } //Use void foo() { scope FastArray array = new FastArray!(int, 10); //Stack scope FastArray array = new FastArray!(int, 10000); //Heap } Of course you never know where you are in the stack, so nesting these to much would be bad.I'd love for "scope foo = new T[len];" to do for arrays what "scope bar = new Class;" does for classes. And indeed, if it's too big the compilerI'm surprised it doesn't and see that as a bit inconsistent, with the only serious argument against it being that 'scope' couldn't be used for large dynamic arrays. But then again: class C { int[264_000] a; } void foo() { scope C c = new C; ... } could also overflow the stack. In either case the work-around would be the same (increase the stack size or not use 'scope').
Nov 12 2008
KennyTM~ wrote:The best solution I can think of, without compiler modification is a struct/class that contains a static array member T[1024] and a dynamic array member T[] initialized to null; and the code chooses which member to use in the constructor. But this always occupies 1024*T.sizeof bytes and there will always be a conditional (if) sticked to all access methods.This entire discussion gets me thinking - could an alternate stack be a good (or even better) solution? Consider a global per-thread "superstack" - a growable region that allocates memory in large chunks and makes sub-chunks accessible in a strictly LIFO manner. The primitives of the superstack are as follows: void* SuperStack.allocate(size_t bytes); void SuperStack.free(size_t bytes); size_t SuperStack.slack(); The SuperStack's management is a singly-linked list of large blocks. The slack function returns how many bytes are left over in the current chunk. If you request more than slack bytes, a new chunk is allocated etc. A SuperStack can grow indefinitely (and is susceptible to leaks). Some convenience functions complete the picture: T[] SuperStack.array(T)(size_t objects); enum Uninitialized { yeahIKnow } T[] SuperStack.array(T)(size_t objects, Uninitialized); Freeing chunks should not be done immediately in order to avoid pathological behavior when a function repeatedly allocates and frees a chunk just to fulfill some small data needs. With the SuperStack in place, code could look like this: void foo(in size_t s) { auto a = SuperStack.array(int)(s, Uninitialized.yeahIKnow); scope(exit) SuperStack.free(s); ... play with a ... } Is there interest for such a thing? Andrei
Nov 12 2008
Andrei Alexandrescu wrote:KennyTM~ wrote:It's an implementation detail, but when the owner thread dies it should null the links for all the list nodes, assuming the data contained in those nodes can be shared across threads.The best solution I can think of, without compiler modification is a struct/class that contains a static array member T[1024] and a dynamic array member T[] initialized to null; and the code chooses which member to use in the constructor. But this always occupies 1024*T.sizeof bytes and there will always be a conditional (if) sticked to all access methods.This entire discussion gets me thinking - could an alternate stack be a good (or even better) solution? Consider a global per-thread "superstack" - a growable region that allocates memory in large chunks and makes sub-chunks accessible in a strictly LIFO manner. The primitives of the superstack are as follows: void* SuperStack.allocate(size_t bytes); void SuperStack.free(size_t bytes); size_t SuperStack.slack(); The SuperStack's management is a singly-linked list of large blocks.With the SuperStack in place, code could look like this: void foo(in size_t s) { auto a = SuperStack.array(int)(s, Uninitialized.yeahIKnow); scope(exit) SuperStack.free(s); ... play with a ... } Is there interest for such a thing?I suppose I'm wondering what the advantage is of this approach over simply using dynamic memory. Is it the fact that allocations are non-locking? Pure stack allocations make sense because all such allocations are scoped and guaranteed to be extremely fast. So I suppose this is simply a parallel local "stack" for the same purpose and the goal is to avoid the assembly magic necessary for an alloca() type solution? And if this is the case, then I assume that sharing would be illegal (at least on D2)? My initial reaction is that I'm skeptical of any attempt at manual replacement of built-in functionality. It's been shown, for example, that custom allocators often perform worse than the default allocators. That aside, the general approach does seem reasonable. Though I'd still prefer we just get real stack-based dynamic allocation in D. It seems quite natural that "scope" should provide this for arrays as a QOI feature, as it does for classes. Sean
Nov 12 2008
Sean Kelly wrote:Andrei Alexandrescu wrote:Damn straight it could free() them assuming it has malloc()ated them!KennyTM~ wrote:It's an implementation detail, but when the owner thread dies it should null the links for all the list nodes, assuming the data contained in those nodes can be shared across threads.The best solution I can think of, without compiler modification is a struct/class that contains a static array member T[1024] and a dynamic array member T[] initialized to null; and the code chooses which member to use in the constructor. But this always occupies 1024*T.sizeof bytes and there will always be a conditional (if) sticked to all access methods.This entire discussion gets me thinking - could an alternate stack be a good (or even better) solution? Consider a global per-thread "superstack" - a growable region that allocates memory in large chunks and makes sub-chunks accessible in a strictly LIFO manner. The primitives of the superstack are as follows: void* SuperStack.allocate(size_t bytes); void SuperStack.free(size_t bytes); size_t SuperStack.slack(); The SuperStack's management is a singly-linked list of large blocks.Speed. An allocation/deallocation is most of the time a counter bump and a check.With the SuperStack in place, code could look like this: void foo(in size_t s) { auto a = SuperStack.array(int)(s, Uninitialized.yeahIKnow); scope(exit) SuperStack.free(s); ... play with a ... } Is there interest for such a thing?I suppose I'm wondering what the advantage is of this approach over simply using dynamic memory.Is it the fact that allocations are non-locking?No.Pure stack allocations make sense because all such allocations are scoped and guaranteed to be extremely fast. So I suppose this is simply a parallel local "stack" for the same purpose and the goal is to avoid the assembly magic necessary for an alloca() type solution? And if this is the case, then I assume that sharing would be illegal (at least on D2)?It is better than the stack because it can tap into the entire heap, as opposed to whatever was reserved for the stack. You can also expand an array on the superstack, whereas you can't with alloca. (I forgot to provide that primitive.) Yah, and indeed recall that D2 is all built around no implicit sharing across threads.My initial reaction is that I'm skeptical of any attempt at manual replacement of built-in functionality. It's been shown, for example, that custom allocators often perform worse than the default allocators.I seem to recall that. Oh, wait, I gave a talk on that, several times :o). Emery Berger studied the problem. It turns out user-defined heaps are usually worse off, with one exception: regions. SuperStack is a region. That's why I'm counting it could make a difference.That aside, the general approach does seem reasonable. Though I'd still prefer we just get real stack-based dynamic allocation in D. It seems quite natural that "scope" should provide this for arrays as a QOI feature, as it does for classes.I agree. There are a few problems. One is that loops can make things tricky, as you need to call alloca only once. The other has to do with the way Walter generates code, I forgot the details, but it's fixable if some time is invested. And the third is, as I said, stack allocation can actually fail long before the system runs out of memory. It has happened to me. Andrei
Nov 12 2008
Andrei Alexandrescu wrote:Sean Kelly wrote:Or delete them if it's newed them :-) Assuming we want the GC to potentially scan at least some of these superblocks.It's an implementation detail, but when the owner thread dies it should null the links for all the list nodes, assuming the data contained in those nodes can be shared across threads.Damn straight it could free() them assuming it has malloc()ated them!Gotcha.I suppose I'm wondering what the advantage is of this approach over simply using dynamic memory.Speed. An allocation/deallocation is most of the time a counter bump and a check.Makes sense. I think stack allocations tend to get a bit weird if you ask for more than a page or so of memory anyway.Is it the fact that allocations are non-locking?No.Pure stack allocations make sense because all such allocations are scoped and guaranteed to be extremely fast. So I suppose this is simply a parallel local "stack" for the same purpose and the goal is to avoid the assembly magic necessary for an alloca() type solution? And if this is the case, then I assume that sharing would be illegal (at least on D2)?It is better than the stack because it can tap into the entire heap, as opposed to whatever was reserved for the stack. You can also expand an array on the superstack, whereas you can't with alloca. (I forgot to provide that primitive.) Yah, and indeed recall that D2 is all built around no implicit sharing across threads.Fair enough :-)My initial reaction is that I'm skeptical of any attempt at manual replacement of built-in functionality. It's been shown, for example, that custom allocators often perform worse than the default allocators.I seem to recall that. Oh, wait, I gave a talk on that, several times :o). Emery Berger studied the problem. It turns out user-defined heaps are usually worse off, with one exception: regions. SuperStack is a region. That's why I'm counting it could make a difference.Sounds like it's worth doing then. At worst it makes for an easy search/replace later if we get this feature natively. I'd certainly use it anyway. I currently use alloca() or new/delete depending on the situation, and it seems convenient to flatten all this into the use of a specialized library module. SeanThat aside, the general approach does seem reasonable. Though I'd still prefer we just get real stack-based dynamic allocation in D. It seems quite natural that "scope" should provide this for arrays as a QOI feature, as it does for classes.I agree. There are a few problems. One is that loops can make things tricky, as you need to call alloca only once. The other has to do with the way Walter generates code, I forgot the details, but it's fixable if some time is invested. And the third is, as I said, stack allocation can actually fail long before the system runs out of memory. It has happened to me.
Nov 12 2008
Andrei Alexandrescu wrote:Sean Kelly wrote:It wouldn't even need to return the memory to the OS. It could even defer that until the next run of the gc, possibly avoiding the need for slack. (So the first action of the gc would be drop the unused space from the superstacks; if that's enough, there's no need to do a full collect). Of course that only makes sense if the OS allocation is slow. Anyway, the idea sounds great to me. It can be faster than stack allocation in some cases, since it can result in less cache pollution (dramatically so in the case where you have tail recursion of a function which is allocating large arrays; all the return addresses stay next to each other on the stack, and the allocated arrays don't need to be touched when they're being deleted).Andrei Alexandrescu wrote:Damn straight it could free() them assuming it has malloc()ated them!KennyTM~ wrote:It's an implementation detail, but when the owner thread dies it should null the links for all the list nodes, assuming the data contained in those nodes can be shared across threads.The best solution I can think of, without compiler modification is a struct/class that contains a static array member T[1024] and a dynamic array member T[] initialized to null; and the code chooses which member to use in the constructor. But this always occupies 1024*T.sizeof bytes and there will always be a conditional (if) sticked to all access methods.This entire discussion gets me thinking - could an alternate stack be a good (or even better) solution? Consider a global per-thread "superstack" - a growable region that allocates memory in large chunks and makes sub-chunks accessible in a strictly LIFO manner. The primitives of the superstack are as follows: void* SuperStack.allocate(size_t bytes); void SuperStack.free(size_t bytes); size_t SuperStack.slack(); The SuperStack's management is a singly-linked list of large blocks.
Nov 12 2008
On Wed, 12 Nov 2008 21:59:16 +0300, Sean Kelly <sean invisibleduck.org> wrote:Andrei Alexandrescu wrote:Not only it is almost as fast as stack allocation, it is much safer (you never get stack overflow, for example).KennyTM~ wrote:It's an implementation detail, but when the owner thread dies it should null the links for all the list nodes, assuming the data contained in those nodes can be shared across threads.The best solution I can think of, without compiler modification is a struct/class that contains a static array member T[1024] and a dynamic array member T[] initialized to null; and the code chooses which member to use in the constructor. But this always occupies 1024*T.sizeof bytes and there will always be a conditional (if) sticked to all access methods.This entire discussion gets me thinking - could an alternate stack be a good (or even better) solution? Consider a global per-thread "superstack" - a growable region that allocates memory in large chunks and makes sub-chunks accessible in a strictly LIFO manner. The primitives of the superstack are as follows: void* SuperStack.allocate(size_t bytes); void SuperStack.free(size_t bytes); size_t SuperStack.slack(); The SuperStack's management is a singly-linked list of large blocks.With the SuperStack in place, code could look like this: void foo(in size_t s) { auto a = SuperStack.array(int)(s, Uninitialized.yeahIKnow); scope(exit) SuperStack.free(s); ... play with a ... } Is there interest for such a thing?I suppose I'm wondering what the advantage is of this approach over simply using dynamic memory. Is it the fact that allocations are non-locking? Pure stack allocations make sense because all such allocations are scoped and guaranteed to be extremely fast.So I suppose this is simply a parallel local "stack" for the same purpose and the goal is to avoid the assembly magic necessary for an alloca() type solution? And if this is the case, then I assume that sharing would be illegal (at least on D2)? My initial reaction is that I'm skeptical of any attempt at manual replacement of built-in functionality. It's been shown, for example, that custom allocators often perform worse than the default allocators. That aside, the general approach does seem reasonable. Though I'd still prefer we just get real stack-based dynamic allocation in D. It seems quite natural that "scope" should provide this for arrays as a QOI feature, as it does for classes.I agree, making "scope T[] t = new T[100];" stack-allocated is great. // D1/D2 scope T[] t1 = new T[size]; // stack-allocated // D2 shared scope T[] t2 = new T[size]; // Does it make any sense? Should it be an error?
Nov 12 2008
Denis Koroskin wrote:// D2 shared scope T[] t2 = new T[size]; // Does it make any sense? Should it be an error?It makes sense. Consider how futures work: shared scope T[] t2 = new T[size]; auto f = new ComputeAsync(t2); ... writefln( f.val ); Here, the computation could be done by another thread but in this situation we know for certain that the other thread will be done with it before t2 is destroyed. Sean
Nov 12 2008
"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message news:gff7g0$t54$1 digitalmars.com...KennyTM~ wrote:That would be interesting and even better if it could be a built-in (not clear on if that was the intention or not). The reason for having it be a built-in is (IMO) it's important for D to provide language constructs that can improve performance for common things like allocating temporary, scope-limited arrays and classes, especially with the new direction D2 discussions have been heading in lately (like pure functions and such).The best solution I can think of, without compiler modification is a struct/class that contains a static array member T[1024] and a dynamic array member T[] initialized to null; and the code chooses which member to use in the constructor. But this always occupies 1024*T.sizeof bytes and there will always be a conditional (if) sticked to all access methods.This entire discussion gets me thinking - could an alternate stack be a good (or even better) solution? Consider a global per-thread "superstack" - a growable region that allocates memory in large chunks and makes sub-chunks accessible in a strictly LIFO manner. The primitives of the superstack are as follows: void* SuperStack.allocate(size_t bytes); void SuperStack.free(size_t bytes); size_t SuperStack.slack(); The SuperStack's management is a singly-linked list of large blocks. The slack function returns how many bytes are left over in the current chunk. If you request more than slack bytes, a new chunk is allocated etc. A SuperStack can grow indefinitely (and is susceptible to leaks). Some convenience functions complete the picture: T[] SuperStack.array(T)(size_t objects); enum Uninitialized { yeahIKnow } T[] SuperStack.array(T)(size_t objects, Uninitialized); Freeing chunks should not be done immediately in order to avoid pathological behavior when a function repeatedly allocates and frees a chunk just to fulfill some small data needs. With the SuperStack in place, code could look like this: void foo(in size_t s) { auto a = SuperStack.array(int)(s, Uninitialized.yeahIKnow); scope(exit) SuperStack.free(s); ... play with a ... } Is there interest for such a thing?Andrei
Nov 12 2008
"Dave" <Dave_Member pathlink.com> wrote in message news:gffgig$1ge6$1 digitalmars.com..."Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message news:gff7g0$t54$1 digitalmars.com...Oh, and any built-in syntax needs to provide a way to allocate uninitialized data also (yeah, I know)...KennyTM~ wrote:That would be interesting and even better if it could be a built-in (not clear on if that was the intention or not). The reason for having it be a built-in is (IMO) it's important for D to provide language constructs that can improve performance for common things like allocating temporary, scope-limited arrays and classes, especially with the new direction D2 discussions have been heading in lately (like pure functions and such).The best solution I can think of, without compiler modification is a struct/class that contains a static array member T[1024] and a dynamic array member T[] initialized to null; and the code chooses which member to use in the constructor. But this always occupies 1024*T.sizeof bytes and there will always be a conditional (if) sticked to all access methods.This entire discussion gets me thinking - could an alternate stack be a good (or even better) solution? Consider a global per-thread "superstack" - a growable region that allocates memory in large chunks and makes sub-chunks accessible in a strictly LIFO manner. The primitives of the superstack are as follows: void* SuperStack.allocate(size_t bytes); void SuperStack.free(size_t bytes); size_t SuperStack.slack(); The SuperStack's management is a singly-linked list of large blocks. The slack function returns how many bytes are left over in the current chunk. If you request more than slack bytes, a new chunk is allocated etc. A SuperStack can grow indefinitely (and is susceptible to leaks). Some convenience functions complete the picture: T[] SuperStack.array(T)(size_t objects); enum Uninitialized { yeahIKnow } T[] SuperStack.array(T)(size_t objects, Uninitialized); Freeing chunks should not be done immediately in order to avoid pathological behavior when a function repeatedly allocates and frees a chunk just to fulfill some small data needs. With the SuperStack in place, code could look like this: void foo(in size_t s) { auto a = SuperStack.array(int)(s, Uninitialized.yeahIKnow); scope(exit) SuperStack.free(s); ... play with a ... } Is there interest for such a thing?Andrei
Nov 12 2008
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s articleKennyTM~ wrote:That sounds terrific, although I think it would be even better if it were built into the runtime. This way, all scope objects, including arrays, could be allocated on the SuperStack. If this is in the language core, the compiler could even optimize at least simple cases automatically, if it could prove that something doesn't escape. The only hard part would be getting the heuristics right for when to free the chunks. I haven't thought through all the contingencies, but I guess this would be as fast as stack allocation, assuming enough slack.The best solution I can think of, without compiler modification is a struct/class that contains a static array member T[1024] and a dynamic array member T[] initialized to null; and the code chooses which member to use in the constructor. But this always occupies 1024*T.sizeof bytes and there will always be a conditional (if) sticked to all access methods.This entire discussion gets me thinking - could an alternate stack be a good (or even better) solution? Consider a global per-thread "superstack" - a growable region that allocates memory in large chunks and makes sub-chunks accessible in a strictly LIFO manner. The primitives of the superstack are as follows: void* SuperStack.allocate(size_t bytes); void SuperStack.free(size_t bytes); size_t SuperStack.slack(); The SuperStack's management is a singly-linked list of large blocks. The slack function returns how many bytes are left over in the current chunk. If you request more than slack bytes, a new chunk is allocated etc. A SuperStack can grow indefinitely (and is susceptible to leaks). Some convenience functions complete the picture: T[] SuperStack.array(T)(size_t objects); enum Uninitialized { yeahIKnow } T[] SuperStack.array(T)(size_t objects, Uninitialized); Freeing chunks should not be done immediately in order to avoid pathological behavior when a function repeatedly allocates and frees a chunk just to fulfill some small data needs. With the SuperStack in place, code could look like this: void foo(in size_t s) { auto a = SuperStack.array(int)(s, Uninitialized.yeahIKnow); scope(exit) SuperStack.free(s); ... play with a ... } Is there interest for such a thing? Andrei
Nov 13 2008
Andrei Alexandrescu wrote:enum Uninitialized { yeahIKnow }Hahaha, brilliant! :DIs there interest for such a thing?Yes, absolutely. I use lots of stack allocations and the vision of an overflow is always lurking somewhere in the shadows... EASTL has a stack allocator, so we must have it as well ;) -- Tomasz Stachowiak http://h3.team0xf.com/ h3/h3r3tic on #D freenode
Nov 14 2008
KennyTM~ wrote:Janderson wrote:I assumed you knew the size at compile time (like in Dave's example) and wanted to avoid stack overflow. Modifying the stack at compile time would be a dangerous thing. I think the best solution would be to have a second stack for that sort of thing, which could be managed in a library still. -JoelKennyTM~ wrote:Is it? I mean things like: int size = read_from_stdin(); scope array = new FixedSizeArray!(int)(size); You can't do it FixedSizeArray!(int, size) because all template arguments must be determined at compile time. The best solution I can think of, without compiler modification is a struct/class that contains a static array member T[1024] and a dynamic array member T[] initialized to null; and the code chooses which member to use in the constructor. But this always occupies 1024*T.sizeof bytes and there will always be a conditional (if) sticked to all access methods. I hope I don't misunderstand your last point :).Janderson wrote:Thanks for clarifying my last point :) -JoelDave wrote:But this won't work if size is runtime-determined.As a work around, I imagine it would be possible to write a template that used the above syntax with a static if that would change depending on the size: Something like this (untested): class FastArray(T, int size) if (size < 1000) { T[size] a; ... Overload operators } class FastArray(T, int size) if (size >= 1000) { T a[] = new T[size]; ... Overload operators } //Use void foo() { scope FastArray array = new FastArray!(int, 10); //Stack scope FastArray array = new FastArray!(int, 10000); //Heap } Of course you never know where you are in the stack, so nesting these to much would be bad.I'd love for "scope foo = new T[len];" to do for arrays what "scope bar = new Class;" does for classes. And indeed, if it's too big the compilerI'm surprised it doesn't and see that as a bit inconsistent, with the only serious argument against it being that 'scope' couldn't be used for large dynamic arrays. But then again: class C { int[264_000] a; } void foo() { scope C c = new C; ... } could also overflow the stack. In either case the work-around would be the same (increase the stack size or not use 'scope').
Nov 12 2008
On Tue, 11 Nov 2008 17:39:07 +0000 (UTC), dsimcha wrote:If length is larger than some arbitrary but reasonable value, this is heap allocated. If length is small, it is stack allocated. Either way, allowing the array to escape the current scope is undefined behavior and the compiler would detect at least the simple cases of this. Ideally, since this would only be used for temp variables in performance-critical code, the contents of the array would also not be initialized.Would there be a problem if multiple stack-arrays were asked for. It might be that one or two such arrays are not going to upset the stack size but allocating 100 of them on the same stack might be a problem. Can the compiler help here? -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Nov 11 2008
Andrei Alexandrescu Wrote:With the SuperStack in place, code could look like this: void foo(in size_t s) { auto a = SuperStack.array(int)(s, Uninitialized.yeahIKnow); scope(exit) SuperStack.free(s); ... play with a ... }void foo(in size_t s) { StackFrame m; //struct, remember heap status in constructor //restore in destructor //bonus: can't alloc unless StackFrame is created auto a = m.alloc(int[])(s, Uninitialized.yeahIKnow); //scope(exit) SuperStack.free(s); you don't need free to free stack ... play with a ... }
Nov 13 2008
Kagamin wrote:Andrei Alexandrescu Wrote:Sweet!! AndreiWith the SuperStack in place, code could look like this: void foo(in size_t s) { auto a = SuperStack.array(int)(s, Uninitialized.yeahIKnow); scope(exit) SuperStack.free(s); ... play with a ... }void foo(in size_t s) { StackFrame m; //struct, remember heap status in constructor //restore in destructor //bonus: can't alloc unless StackFrame is created auto a = m.alloc(int[])(s, Uninitialized.yeahIKnow); //scope(exit) SuperStack.free(s); you don't need free to free stack ... play with a ... }
Nov 13 2008