digitalmars.D - exercise on range: lexeme stream
- spir (88/88) Feb 03 2011 Hello,
Hello, I have a working lexing toolkit that allows defining a lexer from a language's "morphology", then let it scan sources. The result is a 'LexemeStream' struct, which just wraps an array of Lexeme's; for simplicity, each lexeme holds a string tag identifying the kind of lexeme, and the matched slice. Simple example: Morphology morphology = [ [ "SPACING" , `[\ \t]*` ], [ "OPEN_GROUP" , `\(` ], [ "CLOSE_GROUP" , `\)` ], [ "number" , `[\+\-]?[0-9]+(\.[0-9]+)?` ], [ "symbol" , `[a-zA-A][a-zA-A0-9]*` ], [ "operator" , `[\+\-\*\/]` ], ]; auto lexer = new Lexer(morphology); // LexemeStream struct: auto lexemes = lexer.lexemes(source); LexemeStream mainly provides a single operational method 'match' that works more or less like D's builtin 'in' operator: Lexeme* match (string tag) It returns a pointer to the current lexeme if said lexeme is of the right kind, else null. LexemeStream maintains its own cursor (an ordinal index), so that a client parser does not need to update and pass around indices. Very practical. It also exposes an (automagically maintained) 'empty' boolean member. For instance, matching a number may look like: NumberNode readNumber (LexemeStream lexemes) { // no index! auto lexeme = lexemes.match("number"); if (!lexeme) return null; return new NumberNode(lexeme.slice); // cursor has advanced for further parsing } From this point of view, LexemeStream behaves similarly to an input range. But match does both popFront and front, and even also maintains empty in the background. I wonder about the best way to rewrite LexemeStream into a regular range. (*) (**) Random thoughts: I wonder in fact whether LexemeStream matches (no pun intended ;-) the general notion of range, or instead is a different sort of beast; if the latter, then what? The key point, maybe, is that the stream advances /on conditon/. In other words, we do not just traverse it like for a regular iteration. We may need a parameter to popFront holding said conditon (here, the 'tag'). Then, front could return null (or whatever flag value) when the condition is not fulfilled, in which case the range does not move forward at all. The above func would then look like: NumberNode readNumber (LexemeStream lexemes) { // no index! lexemes.popFront("number"); auto lexeme = lexemes.front(); if (!lexeme) return null; return new NumberNode(lexeme.slice); } Another solution may be for popFront to return a boolean flag; then, a gentleman agreement is to read front only in positive (true) case: NumberNode readNumber (LexemeStream lexemes) { // no index! match = lexemes.popFront("number"); if (!match) return null; auto lexeme = lexemes.front(); return new NumberNode(lexeme.slice); } Denis (*) A side issue (easily solvable) is that there must be a cursor. Meaning, LexemeStream cannot just shrink its internal array when matching. The reason is combinations of composite and choice patterns may require backtracking in lexeme stream: T readAssignment (LexemeStream lexemes) { auto cursor = lexemes.cursor // store initial cursor // ... try & match target name ... // ... try & match '=' ... if (!lexeme) { lexemes.cursor = cursor0; // restore intiial cursor return null; } // ... try & match value expression ... } T readStatement (LexemeStream lexemes) { // read assignment|func-call|control-statement|... } A solution is to make popFront advance the cursor instead of eating the array, front return array[cursor] instead of array[0], and empty == (cursor >= array.length) instead of (array.length == 0). (**) I cannot really do it as aof now anyway, because of the bug (in formatValue template constraints) preventing a range to define toString, but am still highly interested in the exercise --if only for the future. -- _________________ vita es estrany spir.wikidot.com
Feb 03 2011