digitalmars.D - Lexers (again)
- Brian Schott (3/3) Dec 13 2013 I've been working on the next attepmpt at a std.lexer /
- Rikki Cattermole (11/14) Dec 13 2013 A problem I noticed was your using ubyte[] at least in the
- Martin Nowak (5/8) Dec 13 2013 Looks promising.
- Brian Schott (11/14) Dec 15 2013 I've ported DScanner over to this new lexer code. It's on a
- Timon Gehr (2/15) Dec 15 2013 I cannot reproduce your problem. If this does not work, it is a bug.
- Andrei Alexandrescu (17/38) Dec 15 2013 The problem is that tok is a dynamic value. It should be a static value....
- Timon Gehr (20/35) Dec 15 2013 Note that the spec has this to say:
- Brian Schott (9/15) Dec 16 2013 This seems to have fixed the case/goto issues.
- Jonas Drewsen (7/10) Dec 16 2013 knit picking... but shouldn't:
I've been working on the next attepmpt at a std.lexer / std.d.lexer recently. You can follow the progress on Github here: https://github.com/Hackerpilot/lexer-work
Dec 13 2013
On Friday, 13 December 2013 at 10:17:49 UTC, Brian Schott wrote:I've been working on the next attepmpt at a std.lexer / std.d.lexer recently. You can follow the progress on Github here: https://github.com/Hackerpilot/lexer-workA problem I noticed was your using ubyte[] at least in the runlexer. Does it work with string and wstring though? Also why is it required to pass the type to the lexer of the code to pass? Is there another way to make it easier to use? Or is the only way to wrap the constructor in a templated function? There also seem to be a lot of generic type method implementations in DLexer that I would expect to be done inside the Lexer super (well template I spose). All in all looks promising.
Dec 13 2013
On 12/13/2013 11:17 AM, Brian Schott wrote:I've been working on the next attepmpt at a std.lexer / std.d.lexer recently. You can follow the progress on Github here: https://github.com/Hackerpilot/lexer-workLooks promising. I hope that I find some time to work on a completely generic DFA lexer generator (regex based). I found a few papers/had some ideas on how to vectorize the DFA processing to make it fast enough.
Dec 13 2013
On Friday, 13 December 2013 at 10:17:49 UTC, Brian Schott wrote:I've been working on the next attepmpt at a std.lexer / std.d.lexer recently. You can follow the progress on Github here: https://github.com/Hackerpilot/lexer-workI've ported DScanner over to this new lexer code. It's on a branch here: https://github.com/Hackerpilot/Dscanner/tree/NewLexer. One limitation I've noticed with the new tok!"tokenName" approach is that while dmd has no problem with case tok!"class": it does have a problem with goto case tok!"class": I managed to work around this by adding new labels and "goto"-ing them instead. Is this a bug or intentional?
Dec 15 2013
On 12/15/2013 12:12 PM, Brian Schott wrote:On Friday, 13 December 2013 at 10:17:49 UTC, Brian Schott wrote:I cannot reproduce your problem. If this does not work, it is a bug.I've been working on the next attepmpt at a std.lexer / std.d.lexer recently. You can follow the progress on Github here: https://github.com/Hackerpilot/lexer-workI've ported DScanner over to this new lexer code. It's on a branch here: https://github.com/Hackerpilot/Dscanner/tree/NewLexer. One limitation I've noticed with the new tok!"tokenName" approach is that while dmd has no problem with case tok!"class": it does have a problem with goto case tok!"class": I managed to work around this by adding new labels and "goto"-ing them instead. Is this a bug or intentional?
Dec 15 2013
On 12/15/13 3:45 AM, Timon Gehr wrote:On 12/15/2013 12:12 PM, Brian Schott wrote:The problem is that tok is a dynamic value. It should be a static value. Current code: static property IDType tok(string symbol)() { ... } It should be: template IDType tok(string symbol)() { alias tok = ...; } This is important - if the compiler thinks tok is a dynamic value, it'll generate crappy switch statements. BTW Brian - I didn't look at this in depth yet but it's very promising work. Thanks! AndreiOn Friday, 13 December 2013 at 10:17:49 UTC, Brian Schott wrote:I cannot reproduce your problem. If this does not work, it is a bug.I've been working on the next attepmpt at a std.lexer / std.d.lexer recently. You can follow the progress on Github here: https://github.com/Hackerpilot/lexer-workI've ported DScanner over to this new lexer code. It's on a branch here: https://github.com/Hackerpilot/Dscanner/tree/NewLexer. One limitation I've noticed with the new tok!"tokenName" approach is that while dmd has no problem with case tok!"class": it does have a problem with goto case tok!"class": I managed to work around this by adding new labels and "goto"-ing them instead. Is this a bug or intentional?
Dec 15 2013
On 12/15/2013 05:38 PM, Andrei Alexandrescu wrote:Note that the spec has this to say: http://dlang.org/statement.html#SwitchStatement "Expression is evaluated. The result type T must be of integral type or char[], wchar[] or dchar[]. The result is compared against each of the case expressions. If there is a match, the corresponding case statement is transferred to. The case expressions must all evaluate to a constant value or array, or a runtime initialized const or immutable variable of integral type. They must be implicitly convertible to the type of the switch Expression. Case expressions must all evaluate to distinct values. Const or immutable variables must all have different names. If they share a value, the first case statement with that value gets control. There must be exactly one default statement." Arguably, this is a questionable language design decision that should IMO be revisited anyway, but DMD clearly does not follow the spec here. Also, there is this: "The fourth form, goto case Expression;, transfers to the CaseStatement of the innermost enclosing SwitchStatement with a matching Expression." It does not say anything about what kind of expression is required.The problem is that tok is a dynamic value. It should be a static value.One limitation I've noticed with the new tok!"tokenName" approach is that while dmd has no problem with case tok!"class": it does have a problem with goto case tok!"class": I managed to work around this by adding new labels and "goto"-ing them instead. Is this a bug or intentional?I cannot reproduce your problem. If this does not work, it is a bug.
Dec 15 2013
On Sunday, 15 December 2013 at 16:38:15 UTC, Andrei Alexandrescu wrote:The problem is that tok is a dynamic value. It should be a static value. Current code:This seems to have fixed the case/goto issues.This is important - if the compiler thinks tok is a dynamic value, it'll generate crappy switch statements.It seems it's hard to keep dmd from generating crappy code even with this fix. I tried it with both LDC and DMD. The code from DMD takes 3.5 times as long to execute.BTW Brian - I didn't look at this in depth yet but it's very promising work. Thanks!It's based off of the gist you posted a while back. I'll have to compare this to what you(r team) came up with for Facebook's C++ analyzer.
Dec 16 2013
On Friday, 13 December 2013 at 10:17:49 UTC, Brian Schott wrote:I've been working on the next attepmpt at a std.lexer / std.d.lexer recently. You can follow the progress on Github here: https://github.com/Hackerpilot/lexer-workknit picking... but shouldn't: size_t line() pure nothrow const property { return _line; } be more like: property size_t line() pure nothrow const { return _line; } to be consistent with phobos coding style? /Jonas
Dec 16 2013