www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Should a parser type be a struct or class?

reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
Should a range-compliant aggregate type realizing a parser be 
encoded as a struct or class? In dmd `Lexer` and `Parser` are 
both classes.

In general how should I reason about whether an aggregate type 
should be encoded as a struct or class?
Jun 17 2020
next sibling parent Simen =?UTF-8?B?S2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:
On Wednesday, 17 June 2020 at 11:50:27 UTC, Per Nordlöw wrote:
 Should a range-compliant aggregate type realizing a parser be 
 encoded as a struct or class? In dmd `Lexer` and `Parser` are 
 both classes.

 In general how should I reason about whether an aggregate type 
 should be encoded as a struct or class?
The heuristic I use is 'do I need polymorphism?' If no, it's a struct. Another thing that may be worth considering is reference semantics. The latter is easy to do with a struct, while polymorphism is generally a class-only thing (but check out Tardy, which Atila Neves recently posted in the Announce group). I would say I basically never use classes in D - pointers and arrays give me all the reference semantics I need, and polymorphism I almost never need. -- Simen
Jun 17 2020
prev sibling next sibling parent Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Wednesday, 17 June 2020 at 11:50:27 UTC, Per Nordlöw wrote:
 Should a range-compliant aggregate type realizing a parser be 
 encoded as a struct or class? In dmd `Lexer` and `Parser` are 
 both classes.

 In general how should I reason about whether an aggregate type 
 should be encoded as a struct or class?
What's a range-compliant aggregate type? Ranges are typically views of someone else's data; an owner of the data woulnd't store mutable iterators, and won't be a range. For that reason also, ranges are structs, as most of them are thin wrappers over a set of iterators with an interface to mutate them. If you *really* need runtime polymorphism as provided by the language - use a class. Otherwise - use a struct. It's pretty straightforward. Even then, in some cases one can realize their own runtime polymorphism without classes (look at e.g. Atila Neves' 'tardy' library). It's very easy to implement a lexer as an input range: it'd just be a pointer into a buffer plus some additional iteration data (like line/column position, for example). I.e. a struct. Making it a struct also allows to make it into a forward range, instead of input range, which is useful if you need lookahead: struct TokenStream { this(SourceBuffer source) { this.cursor = source.text.ptr; advance(this); } bool empty() const { return token.type == TokenType.eof; } ref front() return scope const { return token; } void popFront() { switch (token.type) { default: advance(this); break; case TokenType.eof: break; case TokenType.error: token.type = TokenType.eof; token.lexSpan = LexicalSpan(token.lexSpan.end, token.lexSpan.end); break; } } TokenStream save() const { return this; } private: const(char)* cursor; Location location; Token token; } , where `advance` is implemented as a module private function that actually parses source into next token. DMD's Lexer/Parser aren't ranges. They're ourobori.
Jun 17 2020
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jun 17, 2020 at 11:50:27AM +0000, Per Nordlöw via Digitalmars-d-learn
wrote:
 Should a range-compliant aggregate type realizing a parser be encoded
 as a struct or class?
Preferably a struct IMO, but see below.
 In dmd `Lexer` and `Parser` are both classes.
Probably for historical reasons.
 In general how should I reason about whether an aggregate type should
 be encoded as a struct or class?
1) Does it need runtime polymorphism? If it does, use a class. If not, probably a struct. 2) Does it make more sense as a by-value type, or a by-reference type? In several of my projects, for example, I've had aggregate types start out as structs (because of (1)), but eventually rewritten as (final) classes because I started finding myself using `ref` or `&` everywhere to get by-reference semantics. My rule-of-thumb is basically adopted from TDPL: a struct as a "glorified int" with by-value semantics, a class is a more traditional OO object. If my aggregate behaves like a glorified int, then a struct is a good choice. If it behaves more like a traditional OO encapsulated type, then a class is probably the right answer. T -- Many open minds should be closed for repairs. -- K5 user
Jun 17 2020
prev sibling next sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Wednesday, 17 June 2020 at 11:50:27 UTC, Per Nordlöw wrote:
 Should a range-compliant aggregate type realizing a parser be 
 encoded as a struct or class? In dmd `Lexer` and `Parser` are 
 both classes.

 In general how should I reason about whether an aggregate type 
 should be encoded as a struct or class?
I would say a struct. Parser in dmd does even inherit from Lexer. It seems to be a quirky design. Especially for multi-threaded parsing you might want to have more control over memory layout than classes usually give you.
Jun 17 2020
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Wednesday, 17 June 2020 at 14:24:01 UTC, Stefan Koch wrote:
 Parser in dmd does even inherit from Lexer.
why would a parser ever inherit from a lexer?
Jun 17 2020
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jun 17, 2020 at 02:32:09PM +0000, Adam D. Ruppe via Digitalmars-d-learn
wrote:
 On Wednesday, 17 June 2020 at 14:24:01 UTC, Stefan Koch wrote:
 Parser in dmd does even inherit from Lexer.
why would a parser ever inherit from a lexer?
Because, unlike a regular parser-driven compiler, dmd is a lexer-driven one. :-D T -- The volume of a pizza of thickness a and radius z can be described by the following formula: pi zz a. -- Wouter Verhelst
Jun 17 2020
prev sibling parent reply welkam <wwwelkam gmail.com> writes:
On Wednesday, 17 June 2020 at 14:32:09 UTC, Adam D. Ruppe wrote:
 On Wednesday, 17 June 2020 at 14:24:01 UTC, Stefan Koch wrote:
 Parser in dmd does even inherit from Lexer.
why would a parser ever inherit from a lexer?
So you can write nextToken() instead of lexer.nextToken()
Jun 18 2020
parent welkam <wwwelkam gmail.com> writes:
Oh an also https://github.com/dlang/dmd/pull/9899
Jun 18 2020
prev sibling next sibling parent user1234 <user1234 12.de> writes:
On Wednesday, 17 June 2020 at 11:50:27 UTC, Per Nordlöw wrote:
 Should a range-compliant aggregate type realizing a parser be 
 encoded as a struct or class? In dmd `Lexer` and `Parser` are 
 both classes.

 In general how should I reason about whether an aggregate type 
 should be encoded as a struct or class?
You have the example of libdparse that shows that using a class can be a good idea [1] [2]. For DCD, the parser overrides a few thing because otherwise completion does not work properly or has scope issues. But TBH there's not many reasons to use a class otherwise. [1] https://github.com/dlang-community/dsymbol/blob/master/src/dsymbol/conversion/package.d#L102 [2] https://github.com/dlang-community/dsymbol/blob/master/src/dsymbol/conversion/package.d#L138
Jun 17 2020
prev sibling parent Meta <jared771 gmail.com> writes:
On Wednesday, 17 June 2020 at 11:50:27 UTC, Per Nordlöw wrote:
 Should a range-compliant aggregate type realizing a parser be 
 encoded as a struct or class? In dmd `Lexer` and `Parser` are 
 both classes.

 In general how should I reason about whether an aggregate type 
 should be encoded as a struct or class?
IMO it doesn't need to be. However, it's worth saying that range semantics aren't a great fit for parsers - at least that's been my experience. Parsers need to be able to "synchronize" to recover from syntax errors, which does not fit into the range API very well. You can probably fit it in somewhere in popFront or front or empty, as your implementation permits, but I find it's just easier to forego the range interface and implement whatever primitives you need; *then* you can add a range interface over top that models the output of the parser as a range of expressions, or whatever you want.
Jun 18 2020