digitalmars.D - Advice requested for fixing issue 17914
- Brian Schott (6/6) Oct 23 2017 Context: https://issues.dlang.org/show_bug.cgi?id=17914
- Kagamin (2/2) Oct 24 2017 Call destroy(fiber) when it completed execution? Who manages
- Steven Schveighoffer (4/10) Oct 24 2017 A failing use case would help. Fixing a bug when you can't reproduce is
- Brian Schott (3/6) Oct 24 2017 I've attached one to the bug report.
- qznc (8/14) Oct 24 2017 Looking at git blame [0], I guess Martin Nowak and Nemanja Boric
- Brian Schott (5/7) Oct 24 2017 I've been reading the Fiber code and (so far) that seems seems to
- safety0ff (13/18) Oct 24 2017 Just skimming the Fiber code I found the reset(...) API functions
- Steven Schveighoffer (32/38) Oct 25 2017 It appears that the limitation applies to mmap calls as well, and mmap
- Jonathan M Davis (4/14) Oct 25 2017 Maybe there was a change in the OS(es) being used that affected the limi...
- Nemanja Boric (13/31) Oct 25 2017 Yes, the stack is not immediately unmapped because it's very
- Nemanja Boric (4/25) Oct 25 2017 I'm not sure that would allow us to mprotect the page guards,
- Nemanja Boric (9/36) Oct 25 2017 I think the easiest way to proceed from here is to default the
- Steven Schveighoffer (12/42) Oct 25 2017 mmap does return an error. And onOutMemoryError is called when it fails.
- Nemanja Boric (21/77) Oct 25 2017 Reading FreeBSD man pages, it looks like at least FreeBSD has the
- Steven Schveighoffer (12/41) Oct 25 2017 Hm... the mprotect docs specifically state that calling mprotect on
- Nemanja Boric (10/46) Oct 25 2017 I'm sorry I wrote several messages in the row, as the thoughts
- Nemanja Boric (3/21) Oct 25 2017 On linux it's controllable by: `sysctl vm.max_map_count`
- Nemanja Boric (10/22) Oct 25 2017 Although after the stack overflow protection for fibers number of
Context: https://issues.dlang.org/show_bug.cgi?id=17914 I need to get this issue resolved as soon as possible so that the fix makes it into the next compiler release. Because it involves cleanup code in a class destructor a design change may be necessary. Who should I contact to determine the best way to fix this bug?
Oct 23 2017
Call destroy(fiber) when it completed execution? Who manages fibers?
Oct 24 2017
On 10/23/17 12:56 PM, Brian Schott wrote:Context: https://issues.dlang.org/show_bug.cgi?id=17914 I need to get this issue resolved as soon as possible so that the fix makes it into the next compiler release. Because it involves cleanup code in a class destructor a design change may be necessary. Who should I contact to determine the best way to fix this bug?A failing use case would help. Fixing a bug when you can't reproduce is difficult. -Steve
Oct 24 2017
On Tuesday, 24 October 2017 at 14:28:01 UTC, Steven Schveighoffer wrote:A failing use case would help. Fixing a bug when you can't reproduce is difficult. -SteveI've attached one to the bug report.
Oct 24 2017
On Monday, 23 October 2017 at 16:56:32 UTC, Brian Schott wrote:Context: https://issues.dlang.org/show_bug.cgi?id=17914 I need to get this issue resolved as soon as possible so that the fix makes it into the next compiler release. Because it involves cleanup code in a class destructor a design change may be necessary. Who should I contact to determine the best way to fix this bug?Looking at git blame [0], I guess Martin Nowak and Nemanja Boric seem to be pretty involved. Not sure how deep Petar Kirov and Sean Kelly are into Fibers. My question wrt to the bug: Why is munmap/freeStack called in the destructor? Could be done right after termination? [0] https://github.com/dlang/druntime/blame/ec9a79e15d446863191308fd5e20febce2053546/src/core/thread.d#L4077
Oct 24 2017
On Tuesday, 24 October 2017 at 21:49:10 UTC, qznc wrote:My question wrt to the bug: Why is munmap/freeStack called in the destructor? Could be done right after termination?I've been reading the Fiber code and (so far) that seems seems to be reasonable. Can anybody think of a reason that this would be a bad idea? I'd rather not create a pull request for a design that's not going to work because of a detail I've overlooked.
Oct 24 2017
On Wednesday, 25 October 2017 at 01:26:10 UTC, Brian Schott wrote:I've been reading the Fiber code and (so far) that seems seems to be reasonable. Can anybody think of a reason that this would be a bad idea? I'd rather not create a pull request for a design that's not going to work because of a detail I've overlooked.Just skimming the Fiber code I found the reset(...) API functions whose purpose is to re-use Fibers once they've terminated. Eager stack deallocation would have to coexist with the Fiber reuse API. Perhaps the Fiber reuse API could simply be polished & made easy to integrate so that your original use case no longer hits system limits. I.e. Perhaps an optional delegate could be called upon termination, making it easier to hook in Fiber recycling. The reason my thoughts head in that direction is that I've read that mmap/unmmap 'ing frequently isn't recommended in performance conscious programs.
Oct 24 2017
On 10/23/17 12:56 PM, Brian Schott wrote:Context: https://issues.dlang.org/show_bug.cgi?id=17914 I need to get this issue resolved as soon as possible so that the fix makes it into the next compiler release. Because it involves cleanup code in a class destructor a design change may be necessary. Who should I contact to determine the best way to fix this bug?It appears that the limitation applies to mmap calls as well, and mmap call to allocate the stack has been in Fiber since as far as I can tell the beginning. How has this not shown up before? Regardless of the cause, this puts a limitation on the number of simultaneous Fibers one can have. In other words, this is not just a problem with Fibers not being cleaned up properly, because one may need more than 65k fibers actually running simultaneously. We should try to prevent that as a limitation. For example, even the following code I would think is something we should support: void main() { import std.concurrency : Generator, yield; import std.stdio : File, writeln; auto f = File("/proc/sys/vm/max_map_count", "r"); ulong n; f.readf("%d", &n); writeln("/proc/sys/vm/max_map_count = ", n); Generator!int[] gens; // retain pointers to all the generators foreach (i; 0 .. n + 1000) { if (i % 1000 == 0) writeln("i = ", i); gens ~= new Generator!int({ yield(1); }); } } If we *can't* do this, then we should provide a way to manage the limits I.e. there should be a way to be able to create more than the limit's number of fibers, but only allocate stacks when we can (and have a way to tell the user what's going on). -Steve
Oct 25 2017
On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer via Digitalmars-d wrote:On 10/23/17 12:56 PM, Brian Schott wrote:Maybe there was a change in the OS(es) being used that affected the limit? - Jonathan M DavisContext: https://issues.dlang.org/show_bug.cgi?id=17914 I need to get this issue resolved as soon as possible so that the fix makes it into the next compiler release. Because it involves cleanup code in a class destructor a design change may be necessary. Who should I contact to determine the best way to fix this bug?It appears that the limitation applies to mmap calls as well, and mmap call to allocate the stack has been in Fiber since as far as I can tell the beginning. How has this not shown up before?
Oct 25 2017
On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M Davis wrote:On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer via Digitalmars-d wrote:Yes, the stack is not immediately unmapped because it's very common just to reset the fiber and reuse it for handling the new connection - creating new fibers (and doing unmap on termination) is a problem in the real life (as this is as well). At sociomantic we already had this issue: https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe this is the way to go - I don't see a reason why every stack should be mmaped separately.On 10/23/17 12:56 PM, Brian Schott wrote:Maybe there was a change in the OS(es) being used that affected the limit? - Jonathan M DavisContext: https://issues.dlang.org/show_bug.cgi?id=17914 I need to get this issue resolved as soon as possible so that the fix makes it into the next compiler release. Because it involves cleanup code in a class destructor a design change may be necessary. Who should I contact to determine the best way to fix this bug?It appears that the limitation applies to mmap calls as well, and mmap call to allocate the stack has been in Fiber since as far as I can tell the beginning. How has this not shown up before?
Oct 25 2017
On Wednesday, 25 October 2017 at 15:12:27 UTC, Nemanja Boric wrote:On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M Davis wrote:I'm not sure that would allow us to mprotect the page guards, though.On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer via Digitalmars-d wrote:Yes, the stack is not immediately unmapped because it's very common just to reset the fiber and reuse it for handling the new connection - creating new fibers (and doing unmap on termination) is a problem in the real life (as this is as well). At sociomantic we already had this issue: https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe this is the way to go - I don't see a reason why every stack should be mmaped separately.[...]Maybe there was a change in the OS(es) being used that affected the limit? - Jonathan M Davis
Oct 25 2017
On Wednesday, 25 October 2017 at 15:22:18 UTC, Nemanja Boric wrote:On Wednesday, 25 October 2017 at 15:12:27 UTC, Nemanja Boric wrote:I think the easiest way to proceed from here is to default the stack protection page to 0, so we avoid this regression. Once that's in, we can think about different allocation strategy for the fiber's stacks or throwing the exceptions on too many fibers (too bad mmap doesn't return error code, but you get SIGABRT instead :( ) What do you think?On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M Davis wrote:I'm not sure that would allow us to mprotect the page guards, though.On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer via Digitalmars-d wrote:Yes, the stack is not immediately unmapped because it's very common just to reset the fiber and reuse it for handling the new connection - creating new fibers (and doing unmap on termination) is a problem in the real life (as this is as well). At sociomantic we already had this issue: https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe this is the way to go - I don't see a reason why every stack should be mmaped separately.[...]Maybe there was a change in the OS(es) being used that affected the limit? - Jonathan M Davis
Oct 25 2017
On 10/25/17 11:27 AM, Nemanja Boric wrote:On Wednesday, 25 October 2017 at 15:22:18 UTC, Nemanja Boric wrote:mmap does return an error. And onOutMemoryError is called when it fails. https://github.com/dlang/druntime/blob/master/src/core/thread.d#L4518 Which should throw an error: https://github.com/dlang/druntime/blob/144c9e6e9a3c00aba82b92da527a52190fe91c97/src/core/exception.d#L542 however, when mprotect fails, it calls abort(): https://github.com/dlang/druntime/blob/master/src/core/thread.d#L4540On Wednesday, 25 October 2017 at 15:12:27 UTC, Nemanja Boric wrote:I think the easiest way to proceed from here is to default the stack protection page to 0, so we avoid this regression. Once that's in, we can think about different allocation strategy for the fiber's stacks or throwing the exceptions on too many fibers (too bad mmap doesn't return error code, but you get SIGABRT instead :( )On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M Davis wrote:I'm not sure that would allow us to mprotect the page guards, though.On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer via Digitalmars-d wrote:Yes, the stack is not immediately unmapped because it's very common just to reset the fiber and reuse it for handling the new connection - creating new fibers (and doing unmap on termination) is a problem in the real life (as this is as well). At sociomantic we already had this issue: https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe this is the way to go - I don't see a reason why every stack should be mmaped separately.[...]Maybe there was a change in the OS(es) being used that affected the limit? - Jonathan M DavisWhat do you think?I think we should reverse the mprotect default, and try and determine a better way to opt-in to the limit. Is this a Linux-specific problem? Are there issues with mprotect on other OSes? Or is Linux the only OS that supports mprotect? -Steve
Oct 25 2017
On Wednesday, 25 October 2017 at 15:43:12 UTC, Steven Schveighoffer wrote:On 10/25/17 11:27 AM, Nemanja Boric wrote:Reading FreeBSD man pages, it looks like at least FreeBSD has the same limitation in the mmap, but the man pages for the mprotect doesn't specify this. However, I believe it's just the issue in the man pages, as it would be the same thing really as with Linux. Linux's manpage for mprotect specifically says this:On Wednesday, 25 October 2017 at 15:22:18 UTC, Nemanja Boric wrote:mmap does return an error. And onOutMemoryError is called when it fails. https://github.com/dlang/druntime/blob/master/src/core/thread.d#L4518 Which should throw an error: https://github.com/dlang/druntime/blob/144c9e6e9a3c00aba82b92da527a52190fe91c97/src/core/exception.d#L542 however, when mprotect fails, it calls abort(): https://github.com/dlang/druntime/blob/master/src/core/thread.d#L4540On Wednesday, 25 October 2017 at 15:12:27 UTC, Nemanja Boric wrote:I think the easiest way to proceed from here is to default the stack protection page to 0, so we avoid this regression. Once that's in, we can think about different allocation strategy for the fiber's stacks or throwing the exceptions on too many fibers (too bad mmap doesn't return error code, but you get SIGABRT instead :( )On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M Davis wrote:I'm not sure that would allow us to mprotect the page guards, though.On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer via Digitalmars-d wrote:Yes, the stack is not immediately unmapped because it's very common just to reset the fiber and reuse it for handling the new connection - creating new fibers (and doing unmap on termination) is a problem in the real life (as this is as well). At sociomantic we already had this issue: https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe this is the way to go - I don't see a reason why every stack should be mmaped separately.[...]Maybe there was a change in the OS(es) being used that affected the limit? - Jonathan M DavisWhat do you think?I think we should reverse the mprotect default, and try and determine a better way to opt-in to the limit. Is this a Linux-specific problem? Are there issues with mprotect on other OSes? Or is Linux the only OS that supports mprotect? -SteveENOMEM Changing the protection of a memory region would result in thetotal number of mappings with distinct attributes (e.g., read versus read/write protection) exceeding the allowed maximum. (For example, making the protection of a range PROT_READ in the middle of a region currently protected as PROT_READ|PROT_WRITE would result in three mappings: two read/write mappings at each end and a read-only mapping in the middle.) so I was right about doubling the mappings.I agree: https://github.com/dlang/druntime/pull/1956I think we should reverse the mprotect default, and try anddetermine a better way to opt-in to the limit.
Oct 25 2017
On 10/25/17 11:12 AM, Nemanja Boric wrote:On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M Davis wrote:Hm... the mprotect docs specifically state that calling mprotect on something that's not allocated via mmap is undefined. So if you use GC to allocate Fiber stacks, you can't mprotect it. I think what we need is a more configurable way to allocate stacks. There is a tradeoff to mprotect vs. simple allocation, and it's not obvious to choose one over the other. I still am baffled as to why this is now showing up. Perhaps if you are using mmap as an allocator (as Fiber seems to be doing), it doesn't count towards the limit? Maybe it's just glommed into the standard allocator's space? -SteveOn Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer via Digitalmars-d wrote:Yes, the stack is not immediately unmapped because it's very common just to reset the fiber and reuse it for handling the new connection - creating new fibers (and doing unmap on termination) is a problem in the real life (as this is as well). At sociomantic we already had this issue: https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe this is the way to go - I don't see a reason why every stack should be mmaped separately.On 10/23/17 12:56 PM, Brian Schott wrote:Maybe there was a change in the OS(es) being used that affected the limit?Context: https://issues.dlang.org/show_bug.cgi?id=17914 I need to get this issue resolved as soon as possible so > that thefix makes it into the next compiler release. > Because it involves cleanup code in a class destructor a > design change may be necessary. Who should I contact to > determine the best way to fix this bug? It appears that the limitation applies to mmap calls as well, and mmap call to allocate the stack has been in Fiber since as far as I can tell the beginning. How has this not shown up before?
Oct 25 2017
On Wednesday, 25 October 2017 at 15:32:36 UTC, Steven Schveighoffer wrote:On 10/25/17 11:12 AM, Nemanja Boric wrote:I'm sorry I wrote several messages in the row, as the thoughts were coming to me. I think the reason is that mprotect creates a new range, since it needs to have distinct protection attributes, hence doubling the number of mappings.On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M Davis wrote:Hm... the mprotect docs specifically state that calling mprotect on something that's not allocated via mmap is undefined. So if you use GC to allocate Fiber stacks, you can't mprotect it. I think what we need is a more configurable way to allocate stacks. There is a tradeoff to mprotect vs. simple allocation, and it's not obvious to choose one over the other. I still am baffled as to why this is now showing up. Perhaps if you are using mmap as an allocator (as Fiber seems to be doing), it doesn't count towards the limit? Maybe it's just glommed into the standard allocator's space? -SteveOn Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer via Digitalmars-d wrote:Yes, the stack is not immediately unmapped because it's very common just to reset the fiber and reuse it for handling the new connection - creating new fibers (and doing unmap on termination) is a problem in the real life (as this is as well). At sociomantic we already had this issue: https://github.com/sociomantic-tsunami/tangort/issues/2 Maybe this is the way to go - I don't see a reason why every stack should be mmaped separately.[...]Maybe there was a change in the OS(es) being used that affected the limit?Maybe it's just glommed into the standard allocator's space?No, you get to see each fiber's stack allocated separately when you cat /proc/pid/mappings.
Oct 25 2017
On Wednesday, 25 October 2017 at 14:19:14 UTC, Jonathan M Davis wrote:On Wednesday, October 25, 2017 09:26:26 Steven Schveighoffer via Digitalmars-d wrote:On linux it's controllable by: `sysctl vm.max_map_count`On 10/23/17 12:56 PM, Brian Schott wrote:Maybe there was a change in the OS(es) being used that affected the limit? - Jonathan M DavisContext: https://issues.dlang.org/show_bug.cgi?id=17914 I need to get this issue resolved as soon as possible so that the fix makes it into the next compiler release. Because it involves cleanup code in a class destructor a design change may be necessary. Who should I contact to determine the best way to fix this bug?It appears that the limitation applies to mmap calls as well, and mmap call to allocate the stack has been in Fiber since as far as I can tell the beginning. How has this not shown up before?
Oct 25 2017
On Wednesday, 25 October 2017 at 13:26:26 UTC, Steven Schveighoffer wrote:On 10/23/17 12:56 PM, Brian Schott wrote:Although after the stack overflow protection for fibers number of mmap calls is the same, _I think_ mprotect actually splits the single mmapped region into two (as they share different protection bits). This effectively doubles the number of the regions. As a workaround (if you have lots of fibers and doesn't care about stack overflow protection) you can just pass zero for the guardPageSize: https://github.com/dlang/druntime/blob/master/src/core/thread.d#L4037Context: https://issues.dlang.org/show_bug.cgi?id=17914 I need to get this issue resolved as soon as possible so that the fix makes it into the next compiler release. Because it involves cleanup code in a class destructor a design change may be necessary. Who should I contact to determine the best way to fix this bug?It appears that the limitation applies to mmap calls as well, and mmap call to allocate the stack has been in Fiber since as far as I can tell the beginning. How has this not shown up before?
Oct 25 2017