digitalmars.D - alloca is slow and dangerous
- welkam (9/9) Jan 01 2021 Over the years I saw several people asking why D doesn't have
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/7) Jan 01 2021 13% slower is nothing compared to the alternative which is malloc
- IGotD- (18/27) Jan 01 2021 Not entirely true. You need to check so that the amount elements
- Patrick Schluter (22/53) Jan 01 2021 I think it is even worse than for VLA, because for the VLA you
- welkam (15/28) Jan 03 2021 Because it requires backend help. alloca is a special function
- IGotD- (12/15) Jan 03 2021 I think the default initialization in D is an excellent feature.
- welkam (10/20) Jan 03 2021 Agree. Also Walter put an escape hatch because he knew that some
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (6/11) Jan 03 2021 The standard approach in PL design is to either default
- Steven Schveighoffer (11/20) Jan 01 2021 D has alloca. It's in core.std.stdlib
- Ola Fosheim Grostad (10/13) Jan 01 2021 I never use alloca directly, but in C you can just use an int
- welkam (3/5) Jan 03 2021 Does it work on all compilers and all platforms?
- Steven Schveighoffer (10/16) Jan 04 2021 I don't know. I would expect it to work anywhere D is supported. Looking...
- Iain Buclaw (5/9) Jan 04 2021 There's no such thing as a platform that doesn't support alloca
- Johan (18/25) Jan 04 2021 Why wouldn't it?
- Max Haughton (5/24) Jan 04 2021 Nothing D will ever be used on will have this issue but I think
- Jacob Carlborg (4/8) Jan 06 2021 You can always implement a stack on the heap ;)
Over the years I saw several people asking why D doesn't have alloca. I just want to share a video where it states that alloca is slower than static array so we have one more arguments against it. https://youtu.be/FY9SbqTO5GQ?t=468 In summary alloca is: 1. Hard to implement 2. security problem 3. slower than static array on the stack
Jan 01 2021
On Friday, 1 January 2021 at 14:48:11 UTC, welkam wrote:Over the years I saw several people asking why D doesn't have alloca. I just want to share a video where it states that alloca is slower than static array13% slower is nothing compared to the alternative which is malloc or running out of stack (which will happen real fast if you keep piling up worst case fixed size).
Jan 01 2021
On Friday, 1 January 2021 at 14:48:11 UTC, welkam wrote:Over the years I saw several people asking why D doesn't have alloca. I just want to share a video where it states that alloca is slower than static array so we have one more arguments against it. https://youtu.be/FY9SbqTO5GQ?t=468 In summary alloca is: 1. Hard to implementWhy is it hard to implement?2. security problemNot entirely true. You need to check so that the amount elements is within reasonable limits. You can have stack overflows but most operating systems have guard pages detecting such cases. In kernels it might not be the case but in kernels you need pay attention more than in user space programs. Keep in mind static arrays on the stack are prone to overflows just as much as a VLA. In D static arrays have an additional problem and that is that D will initialize the array by default. For stability this is great but performance this takes more time. VLA can be a better option here as you initialize exactly the amount elements you need.3. slower than static array on the stackThe reason was strangely a lot of code for just allocating on the stack. He said he wasn't even using -O2 optimization and with -O2 it would be smaller. In general it shouldn't be that bad. The fear of alloca in my opinion is exaggerated. Doesn't D implement alloca for those who want it? https://dlang.org/phobos/rt_alloca.html
Jan 01 2021
On Friday, 1 January 2021 at 15:19:07 UTC, IGotD- wrote:On Friday, 1 January 2021 at 14:48:11 UTC, welkam wrote:I think it is even worse than for VLA, because for the VLA you will have used a computed length depending on the parameters. So a real size. With static arrays, one will generally use a "worst case" size which value is much more prone to errors than the real length required. Anecdote: recent gcc versions (since version 7) have for each version better and better heuristic to check the potential size of a buffers with string combination functions (sprintf, strcpy, strcat etc.). The number of potential worst case buffer overflows in legacy code is staggering.Over the years I saw several people asking why D doesn't have alloca. I just want to share a video where it states that alloca is slower than static array so we have one more arguments against it. https://youtu.be/FY9SbqTO5GQ?t=468 In summary alloca is: 1. Hard to implementWhy is it hard to implement?2. security problemNot entirely true. You need to check so that the amount elements is within reasonable limits. You can have stack overflows but most operating systems have guard pages detecting such cases. In kernels it might not be the case but in kernels you need pay attention more than in user space programs. Keep in mind static arrays on the stack are prone to overflows just as much as a VLA.In D static arrays have an additional problem and that is that D will initialize the array by default. For stability this is great but performance this takes more time. VLA can be a better option here as you initialize exactly the amount elements you need.yesafaik VLA get in the way of clever frame pointer optimisations. Local variables and parameters will have to be accessed with slightly more costly code than if the code knows at compile time exactly where they are. Better optimizers and code generation can alleviate it in a lot of cases.3. slower than static array on the stackThe reason was strangely a lot of code for just allocating on the stack. He said he wasn't even using -O2 optimization and with -O2 it would be smaller. In general it shouldn't be that bad.The fear of alloca in my opinion is exaggerated. Doesn't D implement alloca for those who want it? https://dlang.org/phobos/rt_alloca.htmlWhile I'm neither afraid nor sceptical towards VLA/alloca, I have to say after 10 years of use that there are really very few cases where it was really a substantial improvement. It often spared a malloc/free pair but more often than expected added memory copies. YMMV.
Jan 01 2021
Im going to respond to both messages On Friday, 1 January 2021 at 15:19:07 UTC, IGotD- wrote:Why is it hard to implement?Because it requires backend help. alloca is a special function On Friday, 1 January 2021 at 17:48:22 UTC, Patrick Schluter wrote:On Friday, 1 January 2021 at 15:19:07 UTC, IGotD- wrote:Usually this forum has excellent technical responses but this time you dropped ball on this one. Behold byte[128] = void; Walter did a talk on this one. DConf 2019 Day 1 Keynote: Allocating Memory with the D Programming Language -- Walter Bright https://www.youtube.com/watch?t=2210&v=_PB6Hdi4R7M On Friday, 1 January 2021 at 15:19:07 UTC, IGotD- wrote:In D static arrays have an additional problem and that is that D will initialize the array by default. For stability this is great but performance this takes more time. VLA can be a better option here as you initialize exactly the amount elements you need.yesLater he said that when people replaced VLA in kernel they found 13% speed up. While it was not stated kernel should be benchmarked with optimizations enabled.3. slower than static array on the stackThe reason was strangely a lot of code for just allocating on the stack. He said he wasn't even using -O2 optimization and with -O2 it would be smaller. In general it shouldn't be that bad.
Jan 03 2021
On Sunday, 3 January 2021 at 19:12:47 UTC, welkam wrote:Usually this forum has excellent technical responses but this time you dropped ball on this one. Behold byte[128] = void;I think the default initialization in D is an excellent feature. MS has already stated that they want to do this in C++. Sure your example circumvents that but still the default initialization help. My point was that more that you will allocate exactly the elements you need on the stack. For example you have have a maximum of 1024 elements but usually you only use one or two elements. Then static allocation of 1024 elements become crazy big and also it generate more cache misses. I'm not sure why the stack allocation needs to be so enormously complicated like the seminar. In general you just need to subtract the stack pointer.
Jan 03 2021
On Sunday, 3 January 2021 at 19:33:10 UTC, IGotD- wrote:I think the default initialization in D is an excellent feature.Agree. Also Walter put an escape hatch because he knew that some times you dont want it for performance reasons.MS has already stated that they want to do this in C++.Not only MS but also linux people want default initialize everything so they both put effort in compilers to improve optimizations so they dont pay performance penalty where they should not. And in the end we benefit too.My point was that more that you will allocate exactly the elements you need on the stack. For example you have have a maximum of 1024 elements but usually you only use one or two elements. Then static allocation of 1024 elements become crazy big and also it generate more cache misses.If that's the case then you do like Walter showed. Common case on the stack and uncommon big allocations on heap.I'm not sure why the stack allocation needs to be so enormously complicated like the seminar. In general you just need to subtract the stack pointer.I'm not following you
Jan 03 2021
On Sunday, 3 January 2021 at 20:59:00 UTC, welkam wrote:The standard approach in PL design is to either default initialize or to statically establish that variables are not used until initialized (better, but more tricky). C++ was just a small addition to C, so that is where the semantics come from. C is the outlier here, not the norm...MS has already stated that they want to do this in C++.Not only MS but also linux people want default initialize everything so they both put effort in compilers to improve optimizations so they dont pay performance penalty where they should not. And in the end we benefit too.
Jan 03 2021
On 1/1/21 9:48 AM, welkam wrote:Over the years I saw several people asking why D doesn't have alloca. I just want to share a video where it states that alloca is slower than static array so we have one more arguments against it. https://youtu.be/FY9SbqTO5GQ?t=468 In summary alloca is: 1. Hard to implement 2. security problem 3. slower than static array on the stackD has alloca. It's in core.std.stdlib https://dlang.org/phobos/core_stdc_stdlib.html#.alloca In my experience, using it isn't extremely beneficial -- since it's in stdc, it requires the c library, which means you have malloc/free, which is much safer/usable. I'd much rather use a static array, or malloc/scope(exit) free, or a combination between the two that uses the stack when it can, and expands to malloc when needed. Or just use the GC... -Steve
Jan 01 2021
On Friday, 1 January 2021 at 17:55:33 UTC, Steven Schveighoffer wrote:In my experience, using it isn't extremely beneficial -- since it's in stdc, it requires the c library, which means you have malloc/free, which is much safer/usable.I never use alloca directly, but in C you can just use an int variable for the dynamic array size. I find that useful for things like building a zero terminated path that only will be used with one function call. Or for a FFT buffer that is only known at runtime and I want to use hot memory (already in cache). To do that fixed size would take maybe 200 KB, could easily be 100 times more than needed... And the next stack frame would be cold as hell...
Jan 01 2021
On Friday, 1 January 2021 at 17:55:33 UTC, Steven Schveighoffer wrote:D has alloca. It's in core.std.stdlib https://dlang.org/phobos/core_stdc_stdlib.html#.allocaDoes it work on all compilers and all platforms?
Jan 03 2021
On 1/3/21 2:15 PM, welkam wrote:On Friday, 1 January 2021 at 17:55:33 UTC, Steven Schveighoffer wrote:I don't know. I would expect it to work anywhere D is supported. Looking at the code, DMD supports it, with GNU it's an intrinsic, and with LDC it's given a pragma to tag it (presumably so it can be recognized as an intrinsic). Are there other compilers or platforms that work with D besides DMD, LDC, and GDC? Are there some GDC or LDC platforms that don't support alloca? If so, it's possible that some don't support it. But it's not versioned that way in the code. -SteveD has alloca. It's in core.std.stdlib https://dlang.org/phobos/core_stdc_stdlib.html#.allocaDoes it work on all compilers and all platforms?
Jan 04 2021
On Monday, 4 January 2021 at 15:02:32 UTC, Steven Schveighoffer wrote:Are there other compilers or platforms that work with D besides DMD, LDC, and GDC? Are there some GDC or LDC platforms that don't support alloca? If so, it's possible that some don't support it. But it's not versioned that way in the code.There's no such thing as a platform that doesn't support alloca as far as I'm aware. As for C compilers that don't support alloca, they are niche and few.
Jan 04 2021
On Sunday, 3 January 2021 at 19:15:05 UTC, welkam wrote:On Friday, 1 January 2021 at 17:55:33 UTC, Steven Schveighoffer wrote:Why wouldn't it? In C/C++/D/... language land, there is no such thing as "_the_ stack". Yeah, the _compiler_ may decide to use the special instructions/register of the CPU that address "the stack", but there is no guarantee it will; nor is there a guarantee that the binary executable with CPU stack instructions will actually use the stack and not just dynamically allocate things. There are CPUs that do not have such special instructions, and there are platforms that do not use a "stack" in the common interpretation of the word to execute a D program. Examples: Webassembly, an executable running with ASan's FakeStack enabled, a binary running in an emulator, ... LDC emits the same LLVM IR "alloca" instruction for local variables (`int i;`) as for `alloca` function calls. Simplified: if you can write `int i;` for your platform, `core.stdc.stdlib.alloca` also works. ;) -JohanD has alloca. It's in core.std.stdlib https://dlang.org/phobos/core_stdc_stdlib.html#.allocaDoes it work on all compilers and all platforms?
Jan 04 2021
On Monday, 4 January 2021 at 19:38:48 UTC, Johan wrote:On Sunday, 3 January 2021 at 19:15:05 UTC, welkam wrote:Nothing D will ever be used on will have this issue but I think some niche architectures don't have alloca a la C. I've seen some VLIW that can't use it as easily but I can't remember which one's exactly[...]Why wouldn't it? In C/C++/D/... language land, there is no such thing as "_the_ stack". Yeah, the _compiler_ may decide to use the special instructions/register of the CPU that address "the stack", but there is no guarantee it will; nor is there a guarantee that the binary executable with CPU stack instructions will actually use the stack and not just dynamically allocate things. There are CPUs that do not have such special instructions, and there are platforms that do not use a "stack" in the common interpretation of the word to execute a D program. Examples: Webassembly, an executable running with ASan's FakeStack enabled, a binary running in an emulator, ... LDC emits the same LLVM IR "alloca" instruction for local variables (`int i;`) as for `alloca` function calls. Simplified: if you can write `int i;` for your platform, `core.stdc.stdlib.alloca` also works. ;) -Johan
Jan 04 2021
On 2021-01-04 20:38, Johan wrote:there are platforms that do not use a "stack" in the common interpretation of the word to execute a D program. Examples: Webassembly, an executable running with ASan's FakeStack enabled, a binary running in an emulator, ...You can always implement a stack on the heap ;) -- /Jacob Carlborg
Jan 06 2021