www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - core.thread.Fiber --- runtime stack overflow unlike goroutines

reply "Carl Sturtivant" <sturtivant gmail.com> writes:
The default size of the runtime stack for a Fiber is 4*PAGESIZE 
which is very small, and a quick test shows that a Fiber suffers 
a stack overflow that doesn't lead to a clean termination when 
this limit is exceeded.

This makes it difficult to simulate deterministic alternation 
where the stack size needed is unpredictable because complex 
deterministic computations are going on inside Fibers.

In contrast, the Go programming language's goroutines can extend 
their stacks as needed at runtime, and so can be used to simulate 
deterministic alternation without this limitation, and yet be 
initially executed with each having only a small stack size.

There seems to be a claim that all that's needed to add 
D-routines (goroutines for D) is a scheduler and a Channel type, 
on top of Fiber.
http://forum.dlang.org/thread/lphnen$1ml7$1 digitalmars.com
See the initial post, point 7., as well as supporting remarks in 
later replies.

Am I missing something? Is there a clean and simple way to get 
Fiber to no longer suffer a stack overflow when implementing 
D-routines?
Aug 14 2014
next sibling parent "Yuriy" <yuriy.glukhov gmail.com> writes:
Infinite stack comes with a cost. Every function call first 
checks the state of the stack. This should be done with the help 
of compiler. And such a "tradeoff" could scare bare-metal 
programmers away from D. Though maybe there's a chance of making 
this stack check controllable, but not that i can think of.

On Thursday, 14 August 2014 at 07:46:29 UTC, Carl Sturtivant 
wrote:
 The default size of the runtime stack for a Fiber is 4*PAGESIZE 
 which is very small, and a quick test shows that a Fiber 
 suffers a stack overflow that doesn't lead to a clean 
 termination when this limit is exceeded.

 This makes it difficult to simulate deterministic alternation 
 where the stack size needed is unpredictable because complex 
 deterministic computations are going on inside Fibers.

 In contrast, the Go programming language's goroutines can 
 extend their stacks as needed at runtime, and so can be used to 
 simulate deterministic alternation without this limitation, and 
 yet be initially executed with each having only a small stack 
 size.

 There seems to be a claim that all that's needed to add 
 D-routines (goroutines for D) is a scheduler and a Channel 
 type, on top of Fiber.
 http://forum.dlang.org/thread/lphnen$1ml7$1 digitalmars.com
 See the initial post, point 7., as well as supporting remarks 
 in later replies.

 Am I missing something? Is there a clean and simple way to get 
 Fiber to no longer suffer a stack overflow when implementing 
 D-routines?
Aug 14 2014
prev sibling next sibling parent reply "Sean Kelly" <sean invisibleduck.org> writes:
On 64 bit, reserve a huge chunk of memory, set a SEGV handler and 
commit more as needed. Basically how kernel thread stacks work. 
I've been meaning to do this but haven't gotten around to it yet.
Aug 14 2014
next sibling parent "Dicebot" <public dicebot.lv> writes:
On Thursday, 14 August 2014 at 18:52:00 UTC, Sean Kelly wrote:
 On 64 bit, reserve a huge chunk of memory, set a SEGV handler 
 and commit more as needed. Basically how kernel thread stacks 
 work. I've been meaning to do this but haven't gotten around to 
 it yet.
I think using some sort of thread-local shared heap pool is better approach in general as it does need any SEGV handling overhead and simplifies fiber implementation (all fibers are guaranteed to take fixed amount of memory). It is not a silver bullet and forces you to think much more about application memory layout but I believe this is much better approach for high performance services than segmented stack like in Go.
Aug 14 2014
prev sibling next sibling parent "Kagamin" <spam here.lot> writes:
On Thursday, 14 August 2014 at 18:52:00 UTC, Sean Kelly wrote:
 On 64 bit, reserve a huge chunk of memory, set a SEGV handler 
 and commit more as needed. Basically how kernel thread stacks 
 work. I've been meaning to do this but haven't gotten around to 
 it yet.
AFAIK, OS already provides this transparently: when you allocate memory, OS only reserves the memory range and commits pages when they are accessed.
Aug 15 2014
prev sibling next sibling parent reply "Kagamin" <spam here.lot> writes:
http://msdn.microsoft.com/en-us/library/windows/desktop/aa366887%28v=vs.85%29.aspx
 Allocates memory charges (from the overall size of memory and 
 the paging files on disk) for the specified reserved memory 
 pages. The function also guarantees that when the caller later 
 initially accesses the memory, the contents will be zero. 
 Actual physical pages are not allocated unless/until the 
 virtual addresses are actually accessed.
Aug 15 2014
parent reply "Sean Kelly" <sean invisibleduck.org> writes:
On Friday, 15 August 2014 at 08:36:34 UTC, Kagamin wrote:
 http://msdn.microsoft.com/en-us/library/windows/desktop/aa366887%28v=vs.85%29.aspx
 Allocates memory charges (from the overall size of memory and 
 the paging files on disk) for the specified reserved memory 
 pages. The function also guarantees that when the caller later 
 initially accesses the memory, the contents will be zero. 
 Actual physical pages are not allocated unless/until the 
 virtual addresses are actually accessed.
Oh handy, so there's basically no work to be done on Windows. I'll have to check the behavior of mmap on Posix.
Aug 15 2014
next sibling parent reply "Kagamin" <spam here.lot> writes:
On Friday, 15 August 2014 at 14:26:28 UTC, Sean Kelly wrote:
 Oh handy, so there's basically no work to be done on Windows.  
 I'll have to check the behavior of mmap on Posix.
I heard, calloc behaves this way on linux (COW blank page mapped to the entire range), it was discussed here some time ago.
Aug 15 2014
parent "Kagamin" <spam here.lot> writes:
On Friday, 15 August 2014 at 14:28:34 UTC, Kagamin wrote:
 On Friday, 15 August 2014 at 14:26:28 UTC, Sean Kelly wrote:
 Oh handy, so there's basically no work to be done on Windows.  
 I'll have to check the behavior of mmap on Posix.
I heard, calloc behaves this way on linux (COW blank page mapped to the entire range), it was discussed here some time ago.
Another abstraction for core.vmm.
Aug 15 2014
prev sibling parent "Sean Kelly" <sean invisibleduck.org> writes:
On Friday, 15 August 2014 at 14:26:28 UTC, Sean Kelly wrote:
 On Friday, 15 August 2014 at 08:36:34 UTC, Kagamin wrote:
 http://msdn.microsoft.com/en-us/library/windows/desktop/aa366887%28v=vs.85%29.aspx
 Allocates memory charges (from the overall size of memory and 
 the paging files on disk) for the specified reserved memory 
 pages. The function also guarantees that when the caller 
 later initially accesses the memory, the contents will be 
 zero. Actual physical pages are not allocated unless/until 
 the virtual addresses are actually accessed.
Oh handy, so there's basically no work to be done on Windows. I'll have to check the behavior of mmap on Posix.
It sounds like mmap (typically) works the same way on Linux, and that malloc generally does as well. I'll have to test this to be sure. If so, and if doing so is fast, I'm going to increase the default stack size on 64-bit systems to something reasonably large. And here I thought I would have to do this all manually.
Aug 15 2014
prev sibling parent "Carl Sturtivant" <sturtivant gmail.com> writes:
On Thursday, 14 August 2014 at 18:52:00 UTC, Sean Kelly wrote:
 On 64 bit, reserve a huge chunk of memory, set a SEGV handler 
 and commit more as needed. Basically how kernel thread stacks 
 work. I've been meaning to do this but haven't gotten around to 
 it yet.
Very nice; the hardware VM machinery takes care of it and there's only overhead when overflow occurs. D's Fibers will really be something remarkable for user-space with this facility.
Aug 15 2014
prev sibling next sibling parent "Brad Anderson" <eco gnuk.net> writes:
On Thursday, 14 August 2014 at 07:46:29 UTC, Carl Sturtivant 
wrote:
 The default size of the runtime stack for a Fiber is 4*PAGESIZE 
 which is very small, and a quick test shows that a Fiber 
 suffers a stack overflow that doesn't lead to a clean 
 termination when this limit is exceeded.

 This makes it difficult to simulate deterministic alternation 
 where the stack size needed is unpredictable because complex 
 deterministic computations are going on inside Fibers.

 In contrast, the Go programming language's goroutines can 
 extend their stacks as needed at runtime, and so can be used to 
 simulate deterministic alternation without this limitation, and 
 yet be initially executed with each having only a small stack 
 size.

 There seems to be a claim that all that's needed to add 
 D-routines (goroutines for D) is a scheduler and a Channel 
 type, on top of Fiber.
 http://forum.dlang.org/thread/lphnen$1ml7$1 digitalmars.com
 See the initial post, point 7., as well as supporting remarks 
 in later replies.

 Am I missing something? Is there a clean and simple way to get 
 Fiber to no longer suffer a stack overflow when implementing 
 D-routines?
Segmented stacks come with a cost. Rust abandoned them for reasons you can read about here: https://mail.mozilla.org/pipermail/rust-dev/2013-November/006314.html I believe Go has taken steps to help mitigate stack thrashing but I don't know if they have been successful yet.
Aug 14 2014
prev sibling next sibling parent reply "Kagamin" <spam here.lot> writes:
On Thursday, 14 August 2014 at 07:46:29 UTC, Carl Sturtivant 
wrote:
 The default size of the runtime stack for a Fiber is 4*PAGESIZE 
 which is very small, and a quick test shows that a Fiber 
 suffers a stack overflow that doesn't lead to a clean 
 termination when this limit is exceeded.
Pass a bigger stack size to the Fiber constructor?
Aug 15 2014
next sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Friday, 15 August 2014 at 08:41:30 UTC, Kagamin wrote:
 On Thursday, 14 August 2014 at 07:46:29 UTC, Carl Sturtivant 
 wrote:
 The default size of the runtime stack for a Fiber is 
 4*PAGESIZE which is very small, and a quick test shows that a 
 Fiber suffers a stack overflow that doesn't lead to a clean 
 termination when this limit is exceeded.
Pass a bigger stack size to the Fiber constructor?
Won't that kind of kill the purpose of Fiber as low-cost context abstraction? Stack size does add up for thousands of fibers.
Aug 15 2014
next sibling parent "Kagamin" <spam here.lot> writes:
On Friday, 15 August 2014 at 14:28:34 UTC, Dicebot wrote:
 Won't that kind of kill the purpose of Fiber as low-cost 
 context abstraction? Stack size does add up for thousands of 
 fibers.
I didn't measure it.
Aug 15 2014
prev sibling parent reply "Sean Kelly" <sean invisibleduck.org> writes:
On Friday, 15 August 2014 at 14:28:34 UTC, Dicebot wrote:
 Won't that kind of kill the purpose of Fiber as low-cost 
 context abstraction? Stack size does add up for thousands of 
 fibers.
As long as allocation speed is fast for large allocs (which I have to test), I want to change the default size to be very large. The virtual address space in a 64-bit app is enormous, and if the memory is committed on demand then physical memory use should only match what the user actually requires. This should allow us to create millions of fibers and not overrun system memory, and also not worry about stack overruns, which has always been a concern with the default fiber stack size.
Aug 15 2014
next sibling parent reply "Sean Kelly" <sean invisibleduck.org> writes:
At least on OSX, it appears that mapping memory is constant time 
regardless of size, but there is some max total memory I'm 
allowed to map, presumably based on the size of a vmm lookup 
tabe.  The max block size I can allocate is 1 GB, and I can 
allocate roughly 131,000 of these blocks before getting an out of 
memory error.  If I reduce the block size to 4 MB I can allocate 
more than 10M blocks without error.  I think some default stack 
size around 4 MB seems about right.  Increasing the size to 16 MB 
failed after about 800,000 allocations, which isn't enough 
(potential) fibers.
Aug 15 2014
parent Martin Nowak <code dawg.eu> writes:
On 08/15/2014 05:09 PM, Sean Kelly wrote:
 At least on OSX, it appears that mapping memory is constant time
 regardless of size, but there is some max total memory I'm allowed to
 map, presumably based on the size of a vmm lookup tabe.  The max block
 size I can allocate is 1 GB, and I can allocate roughly 131,000 of these
 blocks before getting an out of memory error.  If I reduce the block
 size to 4 MB I can allocate more than 10M blocks without error.  I think
 some default stack size around 4 MB seems about right.  Increasing the
 size to 16 MB failed after about 800,000 allocations, which isn't enough
 (potential) fibers.
While pages are committed lazily it's not free to map more memory. For example the kernel will have to allocate and setup more page table entries. Furthermore we might need to use MAP_NORESERVE to not run into overcommit limitations, but that's not available on all OSes I think. It might make sense to raise the stack size again [1] if we had an idea what size would avoid common issues (vibe.d uses 64kB). 4MB is too much for the default size IMO. Given that the stack size is configurable let's instead try to improve the diagnostic of Fiber stack overflows first, we should also map a guard page on posix and add a segfault handler to detect overflows. [1]: https://github.com/D-Programming-Language/druntime/commit/9dca50fe65402fb3cdfbb689f1aca58dc835dce4#diff-8bb12ed976acf0a5132e877ec5a01ea8R3163
Sep 21 2014
prev sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Friday, 15 August 2014 at 14:45:02 UTC, Sean Kelly wrote:
 On Friday, 15 August 2014 at 14:28:34 UTC, Dicebot wrote:
 Won't that kind of kill the purpose of Fiber as low-cost 
 context abstraction? Stack size does add up for thousands of 
 fibers.
As long as allocation speed is fast for large allocs (which I have to test), I want to change the default size to be very large. The virtual address space in a 64-bit app is enormous, and if the memory is committed on demand then physical memory use should only match what the user actually requires. This should allow us to create millions of fibers and not overrun system memory, and also not worry about stack overruns, which has always been a concern with the default fiber stack size.
No, I was referring to the proposal to supply bigger stack size to Fiber constructor - AFAIR it currently does allocate that memory eagerly (and does not use any OS CoW tools), doesn't it?
Aug 15 2014
parent reply "Sean Kelly" <sean invisibleduck.org> writes:
On Friday, 15 August 2014 at 15:25:23 UTC, Dicebot wrote:
 No, I was referring to the proposal to supply bigger stack size 
 to Fiber constructor - AFAIR it currently does allocate that 
 memory eagerly (and does not use any OS CoW tools), doesn't it?
I thought it did, but apparently the behavior of VirtualAlloc and mmap (which Fiber uses to allocate the stack) simply reserves the range and then commits it lazily, even though what you've told it to do is allocate the memory. This is really great news since it means that no code changes will be required to do the thing I wanted to do anyway.
Aug 15 2014
next sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Friday, 15 August 2014 at 15:40:35 UTC, Sean Kelly wrote:
 On Friday, 15 August 2014 at 15:25:23 UTC, Dicebot wrote:
 No, I was referring to the proposal to supply bigger stack 
 size to Fiber constructor - AFAIR it currently does allocate 
 that memory eagerly (and does not use any OS CoW tools), 
 doesn't it?
I thought it did, but apparently the behavior of VirtualAlloc and mmap (which Fiber uses to allocate the stack) simply reserves the range and then commits it lazily, even though what you've told it to do is allocate the memory. This is really great news since it means that no code changes will be required to do the thing I wanted to do anyway.
Guess that means some experimenting this weekend for me too :)
Aug 15 2014
parent "Dicebot" <public dicebot.lv> writes:
On Friday, 15 August 2014 at 15:50:30 UTC, Dicebot wrote:
 On Friday, 15 August 2014 at 15:40:35 UTC, Sean Kelly wrote:
 On Friday, 15 August 2014 at 15:25:23 UTC, Dicebot wrote:
 No, I was referring to the proposal to supply bigger stack 
 size to Fiber constructor - AFAIR it currently does allocate 
 that memory eagerly (and does not use any OS CoW tools), 
 doesn't it?
I thought it did, but apparently the behavior of VirtualAlloc and mmap (which Fiber uses to allocate the stack) simply reserves the range and then commits it lazily, even though what you've told it to do is allocate the memory. This is really great news since it means that no code changes will be required to do the thing I wanted to do anyway.
Guess that means some experimenting this weekend for me too :)
Looks like it indeed just magically works. I have used this simple program: import core.thread; void foo() { // int[10240] array; Fiber.yield(); for (;;) {} } void main() { foreach (_ ; 0 .. 10_000) { //auto fiber = new Fiber(&foo); auto fiber = new Fiber(&foo, 100_000); fiber.call(); } } And checked peak memory profile via `time --format="%M"`. Fiber constructor argument size did not change memory consumption at all, only changing size of fiber stack variables did.
Aug 22 2014
prev sibling parent reply "Carl Sturtivant" <sturtivant gmail.com> writes:
On Friday, 15 August 2014 at 15:40:35 UTC, Sean Kelly wrote:
 On Friday, 15 August 2014 at 15:25:23 UTC, Dicebot wrote:
 No, I was referring to the proposal to supply bigger stack 
 size to Fiber constructor - AFAIR it currently does allocate 
 that memory eagerly (and does not use any OS CoW tools), 
 doesn't it?
I thought it did, but apparently the behavior of VirtualAlloc and mmap (which Fiber uses to allocate the stack) simply reserves the range and then commits it lazily, even though what you've told it to do is allocate the memory. This is really great news since it means that no code changes will be required to do the thing I wanted to do anyway.
Just read this after posting earlier replies! Very exciting. I'll be doing some experiments to see how this works out. What about at 32-bits?
Aug 15 2014
parent "Sean Kelly" <sean invisibleduck.org> writes:
On Friday, 15 August 2014 at 20:17:51 UTC, Carl Sturtivant wrote:
 On Friday, 15 August 2014 at 15:40:35 UTC, Sean Kelly wrote:
 I thought it did, but apparently the behavior of VirtualAlloc 
 and mmap (which Fiber uses to allocate the stack) simply 
 reserves the range and then commits it lazily, even though 
 what you've told it to do is allocate the memory.  This is 
 really great news since it means that no code changes will be 
 required to do the thing I wanted to do anyway.
Just read this after posting earlier replies! Very exciting. I'll be doing some experiments to see how this works out. What about at 32-bits?
I'm sure it works the same, but reserving large chunks of memory there would eat up the address space. I think the default will have to remain some reasonably low number on 32-bit.
Aug 15 2014
prev sibling parent reply "Carl Sturtivant" <sturtivant gmail.com> writes:
On Friday, 15 August 2014 at 08:41:30 UTC, Kagamin wrote:
 On Thursday, 14 August 2014 at 07:46:29 UTC, Carl Sturtivant 
 wrote:
 The default size of the runtime stack for a Fiber is 
 4*PAGESIZE which is very small, and a quick test shows that a 
 Fiber suffers a stack overflow that doesn't lead to a clean 
 termination when this limit is exceeded.
Pass a bigger stack size to the Fiber constructor?
No good if the stack size needed depends dynamically on the computation in that Fiber.
Aug 15 2014
parent reply "Carl Sturtivant" <sturtivant gmail.com> writes:
On Friday, 15 August 2014 at 20:11:43 UTC, Carl Sturtivant wrote:
 On Friday, 15 August 2014 at 08:41:30 UTC, Kagamin wrote:
 On Thursday, 14 August 2014 at 07:46:29 UTC, Carl Sturtivant 
 wrote:
 The default size of the runtime stack for a Fiber is 
 4*PAGESIZE which is very small, and a quick test shows that a 
 Fiber suffers a stack overflow that doesn't lead to a clean 
 termination when this limit is exceeded.
Pass a bigger stack size to the Fiber constructor?
No good if the stack size needed depends dynamically on the computation in that Fiber.
Should have read further down the thread --- you're right as the memory is in effect merely reserved virtual memory and isn't actually allocated.
Aug 15 2014
parent reply ketmar via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
On Fri, 15 Aug 2014 20:19:18 +0000
Carl Sturtivant via Digitalmars-d-learn
<digitalmars-d-learn puremagic.com> wrote:

 Should have read further down the thread --- you're right as the=20
 memory is in effect merely reserved virtual memory and isn't=20
 actually allocated.
and we -- 32-bit addicts -- will run out of address space while 64-bit happy people will laugh looking at us. ;-)
Aug 15 2014
parent reply "Kagamin" <spam here.lot> writes:
On Friday, 15 August 2014 at 22:26:54 UTC, ketmar via 
Digitalmars-d-learn wrote:
 and we -- 32-bit addicts -- will run out of address space while 
 64-bit
 happy people will laugh looking at us. ;-)
You should only choose stack size carefully or keep data in TempAlloc instead of stack.
Aug 16 2014
parent ketmar via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
On Sat, 16 Aug 2014 17:49:28 +0000
Kagamin via Digitalmars-d-learn <digitalmars-d-learn puremagic.com>
wrote:

 You should only choose stack size carefully or keep data in=20
 TempAlloc instead of stack.
but i can choose stack size carefully without such 'address reserving' feature. ;-)
Aug 16 2014
prev sibling parent Martin Nowak <code dawg.eu> writes:
 Am I missing something? Is there a clean and simple way to get Fiber to
 no longer suffer a stack overflow when implementing D-routines?
Simply choose a big enough stack size when creating your fibers. http://dlang.org/library/core/thread/Fiber.this.html It's fairly cheap to use a really big stack like 4MB, because memory pages are committed lazily by the OS.
Sep 21 2014