digitalmars.D - Explicit TCE
- Tyler Jameson Little (52/52) Oct 12 2012 I've read a few threads discussing tail call elimination, but
- =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (9/57) Oct 12 2012 I'm a big fan of explicit, guaranteed TCE.
- Tyler Jameson Little (11/15) Oct 12 2012 Ugh... I thought that might be a problem.
- David Nadlinger (12/15) Oct 12 2012 LLVM shouldn't be as big a problem – there is some support for
- Tyler Jameson Little (33/48) Oct 12 2012 I found this:
- Dmitry Olshansky (8/56) Oct 12 2012 I suspect it's so called switch-call or forced tail call. I proposed
- Tyler Jameson Little (90/91) Oct 12 2012 I'm not sure which part wasn't clear, so I'll try to explain
- Dmitry Olshansky (13/75) Oct 12 2012 Consider:
- Tyler Jameson Little (12/28) Oct 12 2012 That would work too. If scope() is disallowed, it doesn't matter
- Dmitry Olshansky (13/26) Oct 12 2012 Hey, that's dmd (compiler) using a ton of memory, not std.regex :(
- Tyler Jameson Little (9/20) Oct 12 2012 I think getting Walter Bright on board is the best starting
- bearophile (9/11) Oct 12 2012 From what I've seen LLVM devs seem open enough to good patches.
- bearophile (9/31) Oct 12 2012 Are you talking about CPS?
- Tyler Jameson Little (52/85) Oct 12 2012 I don't think it would necessitate CPS, but that is a nice side
I've read a few threads discussing tail call elimination, but they all wanted the spec to articulate specific circumstances where tail call elimination is required. Has there been any thought to adding syntax to explicitly state tail call elimination? D could use something like Newsqueak's become keyword. If you're not familial with Newsqueak, become is just like a return, except it replaces the stack frame with the function that it calls. This was briefly mentioned years ago in this forum, but the become keyword was ignored: http://www.digitalmars.com/d/archives/digitalmars/D/Rob_Pike_s_Newsqueak_-_some_good_concepts_53511.html I think D could do something like this: int fact(int n, int accum = 1) { if (n == 0) { return accum } // become means return, but guarantees TCE become fact(n - 1, accum * n) } DMD should optimize this already, but explicitly stating become is a key to the compiler that the user wants this call to be eliminated. Then, more interesting things can be implemented more simply, like a state machine: void stateA() { become stateB(); } void stateB() { become stateC(); } void stateC() { return; } void main() { become stateA(); } This would only take a single stack frame. If there were conditionals in there for branching, this could end up with a stack overflow because DMD does not support complicated TCE, only simple recursive TCE. The become keyword would probably have to have these properties: * statement after become must only be a function call (can't do foo() + 3) * scope() needs to be handled appropriately for functions with become There might be others. I'm not sure how D handles stack sizes, so this may be an issue as well if stack sizes are determined at runtime. A requirement could be that functions called with become need to have a static stack size, since this stack might never be collected (in the case of an infinite state machine). Adding this feature wouldn't cost much, but it would add a ton of functional power.
Oct 12 2012
On 12-10-2012 19:29, Tyler Jameson Little wrote:I've read a few threads discussing tail call elimination, but they all wanted the spec to articulate specific circumstances where tail call elimination is required. Has there been any thought to adding syntax to explicitly state tail call elimination? D could use something like Newsqueak's become keyword. If you're not familial with Newsqueak, become is just like a return, except it replaces the stack frame with the function that it calls. This was briefly mentioned years ago in this forum, but the become keyword was ignored: http://www.digitalmars.com/d/archives/digitalmars/D/Rob_Pike_s_Newsqueak_-_some_good_concepts_53511.html I think D could do something like this: int fact(int n, int accum = 1) { if (n == 0) { return accum } // become means return, but guarantees TCE become fact(n - 1, accum * n) } DMD should optimize this already, but explicitly stating become is a key to the compiler that the user wants this call to be eliminated. Then, more interesting things can be implemented more simply, like a state machine: void stateA() { become stateB(); } void stateB() { become stateC(); } void stateC() { return; } void main() { become stateA(); } This would only take a single stack frame. If there were conditionals in there for branching, this could end up with a stack overflow because DMD does not support complicated TCE, only simple recursive TCE. The become keyword would probably have to have these properties: * statement after become must only be a function call (can't do foo() + 3) * scope() needs to be handled appropriately for functions with become There might be others. I'm not sure how D handles stack sizes, so this may be an issue as well if stack sizes are determined at runtime. A requirement could be that functions called with become need to have a static stack size, since this stack might never be collected (in the case of an infinite state machine). Adding this feature wouldn't cost much, but it would add a ton of functional power.I'm a big fan of explicit, guaranteed TCE. However, the primary problem with this approach is a really mundane one: The major compiler back ends (GCC and LLVM) don't have any means of guaranteeing TCE... -- Alex Rønne Petersen alex lycus.org http://lycus.org
Oct 12 2012
I'm a big fan of explicit, guaranteed TCE. However, the primary problem with this approach is a really mundane one: The major compiler back ends (GCC and LLVM) don't have any means of guaranteeing TCE...Ugh... I thought that might be a problem. I don't know too much about GCC/LLVM, but I saw 'tailcallelim' for LLVM: http://llvm.org/docs/Passes.html#tailcallelim GCC seems to support it in 4.x: arxiv.org/pdf/1109.4048 These look promising, so I wouldn't completely rule out the possibility of doing it in GCC/LLVM. Perhaps someone more knowledgeable about GCC/LLVM could comment? I would really like to see D have this feature (then I can stop daydreaming about LISP).
Oct 12 2012
On Friday, 12 October 2012 at 17:39:53 UTC, Alex Rønne Petersen wrote:However, the primary problem with this approach is a really mundane one: The major compiler back ends (GCC and LLVM) don't have any means of guaranteeing TCE...LLVM shouldn't be as big a problem – there is some support for guaranteed TCO in order to make implementations of some of the functional languages possible. I know that you can force LLVM to tail-call everything it possibly can (which in consequence horribly breaks the ABI), but I am not sure right now how fine-grained you can control that mechanism. Also don't forget that some calling conventions don't lend themselves particularly well for doing efficient tail calls. David
Oct 12 2012
On Friday, 12 October 2012 at 20:23:00 UTC, David Nadlinger wrote:On Friday, 12 October 2012 at 17:39:53 UTC, Alex Rønne Petersen wrote:I found this: http://llvm.org/docs/CodeGenerator.html#tail-call-optimization http://llvm.org/docs/CodeGenerator.html#target-feature-matrix It seems that llvm won't be a problem. I've never worked with LLVM (or any compiler for that matter) at this low of a level, but I assume that the front-end produces code that looks like the provided code snippet in the first link. If that's the case, then we can basically guarantee that LLVM will do what we expect, as long as we can guarantee that all callers and callees use "fastcc". I'm not 100% on the implications of this, but it should work. As for GCC, the situation seems less hopeful. I found this thread about GUILE, but it did mention GCC's lack of support for tail calls. This was april of last year, so maybe things have improved. The thread does mention that the GCC devs would be open to suggestions, but it seems like this might be a harder fought battle than for LLVM. http://lists.gnu.org/archive/html/guile-devel/2011-04/msg00055.html LLVM should be sufficient though, right? GDC can just outright reject explicit TCO for now until it supports proper TCO. Maybe the GUILE mailing list would be a good place to start, since there may be efforts already there. What steps would need to happen for this to become a reality? Here's my list: 1. Get Walter Bright/Andrei Alexandrescu on board 2. Verify that it will work with LLVM 3. Get it working in DMD 4. Get it working in LDC 5. Work with GCC devs Is there enough interest in this to implement it? I really don't know DMD or LLVM at all, so I don't know how big of a project this is.However, the primary problem with this approach is a really mundane one: The major compiler back ends (GCC and LLVM) don't have any means of guaranteeing TCE...LLVM shouldn't be as big a problem – there is some support for guaranteed TCO in order to make implementations of some of the functional languages possible. I know that you can force LLVM to tail-call everything it possibly can (which in consequence horribly breaks the ABI), but I am not sure right now how fine-grained you can control that mechanism. Also don't forget that some calling conventions don't lend themselves particularly well for doing efficient tail calls. David
Oct 12 2012
On 12-Oct-12 21:29, Tyler Jameson Little wrote:I've read a few threads discussing tail call elimination, but they all wanted the spec to articulate specific circumstances where tail call elimination is required. Has there been any thought to adding syntax to explicitly state tail call elimination? D could use something like Newsqueak's become keyword. If you're not familial with Newsqueak, become is just like a return, except it replaces the stack frame with the function that it calls. This was briefly mentioned years ago in this forum, but the become keyword was ignored:I suspect it's so called switch-call or forced tail call. I proposed something to the exact effect of the below but to reuse the 'goto' keyword.http://www.digitalmars.com/d/archives/digitalmars/D/Rob_Pike_s_Newsqueak_-_some_good_concepts_53511.html I think D could do something like this: int fact(int n, int accum = 1) { if (n == 0) { return accum } // become means return, but guarantees TCE become fact(n - 1, accum * n) } DMD should optimize this already, but explicitly stating become is a key to the compiler that the user wants this call to be eliminated. Then, more interesting things can be implemented more simply, like a state machine: void stateA() { become stateB(); } void stateB() { become stateC(); } void stateC() { return; } void main() { become stateA(); } This would only take a single stack frame. If there were conditionals in there for branching, this could end up with a stack overflow because DMD does not support complicated TCE, only simple recursive TCE. The become keyword would probably have to have these properties: * statement after become must only be a function call (can't do foo() + 3) * scope() needs to be handled appropriately for functions with become There might be others. I'm not sure how D handles stack sizes, so this may be an issue as well if stack sizes are determined at runtime. A requirement could be that functions called with become need to have a static stack size, since this stack might never be collected (in the case of an infinite state machine).No idea what you are talking about.Adding this feature wouldn't cost much, but it would add a ton of functional power.The use case in general is threaded code. In particular fast virtual machines. -- Dmitry Olshansky
Oct 12 2012
No idea what you are talking about.I'm not sure which part wasn't clear, so I'll try to explain myself. Please don't feel offended if I clarify things you already understand. An optimizable tail call must simply be a function call. The current stack frame would be replaced with the new function, so anything more complex than a simple function call would require some stack from the preceding function to stick around in the new function, thus requiring the old stack to stick around. For example, te following is not optimizable the old stack (the one with 3) needs to be maintained until foo() returns, which is not TCE. return foo() * 3 Since the old stack won't be around anymore, that leaves us with in a sticky situation with regard to scope(): http://dlang.org/statement.html#ScopeGuardStatement If the current stack is going to be replaced with data from another function call, the behavior of scope() is undefined. The scope that scope() was in has now been repurpose, but the scope is still kind of there. If scope() is allowed, they must be executed just before the tail call, otherwise it will be overwritten (or it has to stick around until the actual stack frame is cleared. Consider: void a() { become b(); } void b() { // when does this get called? scope(exit) writeln("exited"); become a(); } If we allow scope(), then the line should be written before the call to a(). If we don't, then this is a compile time error. I like disallowing it personally, because if the scope(exit) call frees some memory that is passed to a, the programmer may think that it will be called after a exits, which may not be the case. void a(void* arr) { // do something with arr become b(); } void b() { void* arr = malloc(sizeof(float) * 16); scope(exit) free(arr); become a(arr); } I just see this as being a problem for those who don't fully understand scoping and TCE. My mention of overhead was just how complicated it would be to implement. The general algorithm is (for each become keyword): * determine max stack size (consider all branches in all recursive contexts) * allocate stack size for top-level function * do normal TCE stuff (use existing stack for new call) The stack size should be known at compile time for cases like the one above (a calls b, b calls a, infinitely) to avoid infinitely expanding stack. A situation like this is a memory optimization, so forcing guaranteed stack size puts an upper-bound on memory usage, which is the whole point of TCE. If the stack is allowed to grow, there is opportunity for stack overflow. My use case for this is a simple compiler, but I'm sure this could be applied to other use cases as well. I'd like to produce code for some BNF-style grammar where each LHS is a function. Thus, my state machine wouldn't be a huge, unnatural switch statement that reads in the current state, but a series of code branches that 'become' other states, like an actual state machine. For example: A := B | C | hello B := bye | see ya C := go away void A() { char next = getNext(); if (next == 'b' || next == 's') { become B(); } if (next == 'g') { become C(); } if (next == 'h') { // consume until hello is found, or throw exception // then put some token on the stack } } void B() { // consume until 'bye' or 'see ya' } void C() { // consume until 'go away' } This would minimize memory use and allow me to write code that more closely matches the grammar. There are plenty of other use cases, but DSLs would be very easy to implement with TCE.
Oct 12 2012
On 12-Oct-12 22:17, Tyler Jameson Little wrote:No idea what you are talking about.I'm not sure which part wasn't clear, so I'll try to explain myself. Please don't feel offended if I clarify things you already understand.An optimizable tail call must simply be a function call. The current stack frame would be replaced with the new function, so anything more complex than a simple function call would require some stack from the preceding function to stick around in the new function, thus requiring the old stack to stick around.Yup.For example, te following is not optimizable the old stack (the one with 3) needs to be maintained until foo() returns, which is not TCE. return foo() * 3 Since the old stack won't be around anymore, that leaves us with in a sticky situation with regard to scope(): http://dlang.org/statement.html#ScopeGuardStatement If the current stack is going to be replaced with data from another function call, the behavior of scope() is undefined. The scope that scope() was in has now been repurpose, but the scope is still kind of there. If scope() is allowed, they must be executed just before the tail call, otherwise it will be overwritten (or it has to stick around until the actual stack frame is cleared.Consider:void a() { become b(); } void b() { // when does this get called? scope(exit) writeln("exited"); become a(); } If we allow scope(), then the line should be written before the call to a(). If we don't, then this is a compile time error. I like disallowing it personally, because if the scope(exit) call frees some memory that is passed to a, the programmer may think that it will be called after a exits, which may not be the case.I think it should be disallowed. That along with destructors.void a(void* arr) { // do something with arr become b(); } void b() { void* arr = malloc(sizeof(float) * 16); scope(exit) free(arr); become a(arr); } I just see this as being a problem for those who don't fully understand scoping and TCE.My mention of overhead was just how complicated it would be to implement. The general algorithm is (for each become keyword): * determine max stack size (consider all branches in all recursive contexts) * allocate stack size for top-level function * do normal TCE stuff (use existing stack for new call)What's wrong with just allocating a new stack _in-place_ of the old? In other words make 'become' synonym for 'reuse the current stack frame'. Effectively you still stay in constant space that is maximum of all functions being called.The stack size should be known at compile time for cases like the one above (a calls b, b calls a, infinitely) to avoid infinitely expanding stack. A situation like this is a memory optimization, so forcing guaranteed stack size puts an upper-bound on memory usage, which is the whole point of TCE. If the stack is allowed to grow, there is opportunity for stack overflow.Right.My use case for this is a simple compiler, but I'm sure this could be applied to other use cases as well. I'd like to produce code for some BNF-style grammar where each LHS is a function. Thus, my state machine wouldn't be a huge, unnatural switch statement that reads in the current state, but a series of code branches that 'become' other states, like an actual state machine.I see nice staff. My use case is optimizing virtual machine, the one inside std.regex primarily. -- Dmitry Olshansky
Oct 12 2012
That would work too. If scope() is disallowed, it doesn't matter where the stack comes from. It's only slightly cheaper to reuse the current stack (CPU), but making a new one would be lighter on memory.My mention of overhead was just how complicated it would be to implement. The general algorithm is (for each become keyword): * determine max stack size (consider all branches in all recursive contexts) * allocate stack size for top-level function * do normal TCE stuff (use existing stack for new call)What's wrong with just allocating a new stack _in-place_ of the old? In other words make 'become' synonym for 'reuse the current stack frame'. Effectively you still stay in constant space that is maximum of all functions being called.I see nice staff. My use case is optimizing virtual machine, the one inside std.regex primarily.Yeah, that is a great example! I've read some bug reports about std.regex using a ton of memory, especially with CTFE. Since regex is by definition a state machine, this would be a particularly elegant fit (granted, backreferences et al break that model, but it's still a nice metaphor). The main problem I see is working with other compilers like GCC/LLVM. If this can be done on those compilers, I don't see any major hurdle to getting this implemented.
Oct 12 2012
On 12-Oct-12 22:49, Tyler Jameson Little wrote:That would work too. If scope() is disallowed, it doesn't matter where the stack comes from. It's only slightly cheaper to reuse the current stack (CPU), but making a new one would be lighter on memory.Hey, that's dmd (compiler) using a ton of memory, not std.regex :( It actually flies with only a modest set of ram after CTFE (or rather 'if') succeeds that is :) Since regex is byI see nice staff. My use case is optimizing virtual machine, the one inside std.regex primarily.Yeah, that is a great example! I've read some bug reports about std.regex using a ton of memory, especially with CTFE.definition a state machine, this would be a particularly elegant fit (granted, backreferences et al break that model, but it's still a nice metaphor).Yeah, without backreferences it's a state machine. Still it's NFA (non-deterministic) as no DFA would do if you want to get things like captures of sub matches etc. Either way the remarkable giant switch is present ;)The main problem I see is working with other compilers like GCC/LLVM. If this can be done on those compilers, I don't see any major hurdle to getting this implemented.Perhaps the biggest one would be convincing GCC/LLVM devs to accept patches :) -- Dmitry Olshansky
Oct 12 2012
Hey, that's dmd (compiler) using a ton of memory, not std.regex :( It actually flies with only a modest set of ram after CTFE (or rather 'if') succeeds that is :)My bad. Even then, TCE wouldn't hurt.I think getting Walter Bright on board is the best starting point. If he likes the idea, I'm sure we can work out way with the GCC/LLVM devs. I saw some basic signs (noted earlier) that this may be a non-issue, as the functionality may already be there. I'll keep looking and see if I can find a definitive answer for those compilers. Would support of one of the compilers be enough, or would both be required to get this in the formal language spec?The main problem I see is working with other compilers like GCC/LLVM. If this can be done on those compilers, I don't see any major hurdle to getting this implemented.Perhaps the biggest one would be convincing GCC/LLVM devs to accept patches :)
Oct 12 2012
Dmitry Olshansky:Perhaps the biggest one would be convincing GCC/LLVM devs to accept patches :)From what I've seen LLVM devs seem open enough to good patches. They have accepted several changes to allow LLVM to become the back-end of GHC (Haskell compiler), and generally my enhancement requests to improve LDC1 life were well accepted. They have even kept open a large request enhancement of mine to optimize D vector ops in the back-end. Bye, bearophile
Oct 12 2012
Tyler Jameson Little:D could use something like Newsqueak's become keyword. If you're not familial with Newsqueak, become is just like a return, except it replaces the stack frame with the function that it calls.Are you talking about CPS? http://en.wikipedia.org/wiki/Continuation_passing_styleDMD should optimize this already, but explicitly stating become is a key to the compiler that the user wants this call to be eliminated. Then, more interesting things can be implemented more simply, like a state machine: void stateA() { become stateB(); } void stateB() { become stateC(); } void stateC() { return; } void main() { become stateA(); }Seems nice.I'm not sure how D handles stack sizes, so this may be an issue as well if stack sizes are determined at runtime.D doesn't currently support C99 VLAs, but it supports alloca(), so stack frames are sized dynamically. But maybe this is not a big problem for CPS. Bye, bearophile
Oct 12 2012
On Friday, 12 October 2012 at 18:02:57 UTC, bearophile wrote:Tyler Jameson Little:I don't think it would necessitate CPS, but that is a nice side effect. I'm thinking more of a recursive function call that may or may not return. For example, a process that shovels data between two network connections. If the data never stops, the function will never return. If there's some kind of a problem, then it could return with that error, and be restarted when that problem is fixed. All of this could happen with a series of function calls that use the same stack. void handleIncommingData() { if (error) { // returns directly to manageLongRunningProcess return; } // do something useful become longRunningProcess(); } void longRunningProcess() { become handleIncommingData(); } void manageLongRunningProcess() { longRunningProcess(); // there was a problem, so fix it .... // try again manageLongRunningProcess(); } Exceptions are not needed, so these can be nothrow functions, and this implementation is simpler than some complex while loop, while having the same memory footprint. CPS would make things like imitating javascript's setTimeout/setInterval possible. I don't think this is a major benefit for D because the parallelism/concurrency support is already pretty awesome. The main benefit is for implementing things like lexical analyzers (or tokenizers, whatever), which don't really depend on previous states and can emit tokens. This allows for efficient representation of recursive problems, that call functions circularly (a -> b -> c -> a -> b ...), like a state machine. I think it just allows an extra level of expressiveness without a backwards incompatible change to the language. True, you can express this same idea with trampolining, but that isn't as fun: http://stackoverflow.com/a/489860/538551 http://en.wikipedia.org/wiki/Tail-recursive_function#Through_trampolining There are still some problems that I think a LISP language would make more sense for, and for those problems, it would be great to express them in D with my other code.D could use something like Newsqueak's become keyword. If you're not familial with Newsqueak, become is just like a return, except it replaces the stack frame with the function that it calls.Are you talking about CPS? http://en.wikipedia.org/wiki/Continuation_passing_styleI'm glad you think so =DDMD should optimize this already, but explicitly stating become is a key to the compiler that the user wants this call to be eliminated. Then, more interesting things can be implemented more simply, like a state machine: void stateA() { become stateB(); } void stateB() { become stateC(); } void stateC() { return; } void main() { become stateA(); }Seems nice.Well, dynamic stack frames aren't strictly a bad thing for CPS, it just removes the memory use guarantee. There's already a huge memory gain from using TCE, I just don't know debugging would be done if a function keeps adding to and passing a dynamic array.I'm not sure how D handles stack sizes, so this may be an issue as well if stack sizes are determined at runtime.D doesn't currently support C99 VLAs, but it supports alloca(), so stack frames are sized dynamically. But maybe this is not a big problem for CPS.
Oct 12 2012