www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Feedback needed: Complete symbol appoach for Bison's D backend

reply Adela Vais <adela.vais99 gmail.com> writes:
Hello!

I need some feedback about the return value of yylex() in Bison's 
Lexer class, which must be provided by the user.

This method should provide the Bison parser with three values: 
the TokenKind (which is the current return value), the semantic 
value, and the location (optional parameter). The last two are 
set in yylex(), stored in the lexer class, and retrieved by the 
Bison parser through getters.

The other parsers provide the option of complete symbols, which 
means that yylex()'s return value is changed to a structure that 
binds together the TokenKind, the semantic value, and the 
location. Internally, the structure is immediately divided into 
its components, which continue to be used separately throughout 
the parser.

The big advantage of the complete symbol is that it is 
beginner-friendly, and reduces the potential errors caused 
because the user forgot to set one of the values.
The main disadvantage is the possible overhead the structure adds 
to the parser. It will be created and destroyed for each 
discovered token.

Should we keep both versions, or move to a complete symbol 
approach? Given that Bison's current release still has D as an 
experimental feature, this would not be a breaking change. If we 
decide on using both, the complete symbol approach will be 
selected through a Bison directive, like in the other parsers.

An example of the current method, using TokenKind:
https://github.com/akimd/bison/blob/master/examples/d/calc/calc.y#L117

An example using the Symbol struct:
https://github.com/adelavais/bison/blob/complete-external-symbols/examples/d/calc/calc.y#L117
Nov 14 2020
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sat, Nov 14, 2020 at 03:50:23PM +0000, Adela Vais via Digitalmars-d wrote:
[...]
 I need some feedback about the return value of yylex() in Bison's
 Lexer class, which must be provided by the user.
[...]
 An example of the current method, using TokenKind:
 https://github.com/akimd/bison/blob/master/examples/d/calc/calc.y#L117
 
 An example using the Symbol struct:
 https://github.com/adelavais/bison/blob/complete-external-symbols/examples/d/calc/calc.y#L117
Hi Adela, I took a quick look the code. I agree that returning Symbol is best because it gives the most friendly API. Generally, returning a struct ought to be quite cheap: for small structs, it could even be returned in CPU registers so the cost will be minimal. However, I see that you allocate a new instance of YYLocation each time: that's bound to have performance issues. Is there any reason to allocate YYLocation on the heap? Is it because it's a class as opposed to a struct? If it's a class, what was the rationale behind it? In my mind, it should be a struct unless there's something in it that must persist on the heap. Based on its construction parameters, it looks to me to be just a container to store start/end positions in the input; if so, it does not need to be a class. A struct will do just fine, and will avoid unnecessary GC allocations. [...]
 The main disadvantage is the possible overhead the structure adds to
 the parser. It will be created and destroyed for each discovered
 token.
Make it a struct, and make all of its members structs or PODs. Then there will be minimal construction overhead, and no destruction costs at all. T -- BREAKFAST.COM halted...Cereal Port Not Responding. -- YHL
Nov 16 2020
parent Adela Vais <adela.vais99 gmail.com> writes:
On Monday, 16 November 2020 at 19:42:43 UTC, H. S. Teoh wrote:
 [...]
 Make it a struct, and make all of its members structs or PODs. 
 Then there will be minimal construction overhead, and no 
 destruction costs at all.
Thank you for the response! I will make this modification.
Nov 18 2020