digitalmars.D.learn - Parser
- Cecil Ward (38/38) Jun 14 2023 I’m thinking that I might had to end up writing a partial, rather
- Ben Jones (10/15) Jun 15 2023 A couple of pointers for general parsers:
I’m thinking that I might had to end up writing a partial, rather rough parser for parts of the D language. Could I get some suggestions for help that I might find in the way of software components? D has a very powerful regex module, I believe. I have been writing inline asm library routines for GDC as a learning exercise and unfortunately I can’t build them under LDC because LDC does not yet offer full support for the GCC in-line asm grammar, specifically named in-asm arguments such as " mov %[dest], %[src]" - where you see the names enclosed in [ ]. I’m thinking that I might have to fix this deficiency myself. There’s no way that I can enhance LDC myself as I wouldn’t even know where to start. I could pre-process the string expressions used in inline asm so that LDC could understand an alternative easier grammar, one where there are numbers instead of "[names]", eg "%0" instead of the meaningful "%[dest]". It seems that the compilers take string _expressions_ everywhere rather than just simple literal strings. Can I generate fragments of D and inject them into the rest of the code using mixin? Not really sure how to use it. There are three string expressions involved: the string containing the asm, which needs to be scanned for %[ names ], and these need to be replaced with numbers in order of occurrence of declarations of the names, then an outputs section and an inputs section which can both contain declarations of these names, eg ‘: [ dest ] "=r" ( d-expression ) ,’ … ‘: [ src ]’…. The arbitrary fragment of D in d-expression can unfortunately be anything, and there’s no way I can write a full D lexer/parser to scan that properly, but luckily I just have to pass over it to find its terminator which is either a ‘,’ or a ‘:’. (There might be a case where there is a ‘;’ as a terminator instead of a ‘:’, I’m not sure if that’s permitted in the grammar immediately after the inputs section. But having to parse all the types of strings and operators in a string-expression is hard enough. I will also have to deal with all the possible comment types wherever they can occur, which is all over the place within, before and after these expressions. Any tips, modules that I could use would be most welcome. I’m very much out of my depth here.
Jun 14 2023
On Wednesday, 14 June 2023 at 09:28:57 UTC, Cecil Ward wrote:I’m thinking that I might had to end up writing a partial, rather rough parser for parts of the D language. Could I get some suggestions for help that I might find in the way of software components? D has a very powerful regex module, I believe.A couple of pointers for general parsers: The Pegged library: https://github.com/PhilippeSigaud/Pegged/tree/master is pretty popular for building general parsers I have a rough implementation a similar idea here: https://github.com/benjones/autoparsed (definitely less polished and probably buggier than pegged). In mine you annotate your types with their syntax and can then call parse!MyType on a token stream to get a MyType.
Jun 15 2023