www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - re2d lexer generator

reply Ulya <skvadrik gmail.com> writes:
Regular expression compiler [re2c](http://re2c.org) now [supports 
D](http://re2c.org/releases/release_notes.html#release-4-0).

A short intro from the official website: *re2c* stands for 
*Regular Expressions to Code*. It is a free and open-source lexer 
generator that supports C, C++, D, Go, Haskell, Java, JavaScript, 
OCaml, Python, Rust, V, Zig, and can be extended to other 
languages by implementing a single [syntax 
file](http://re2c.org/manual/manual_d.html#syntax-files). The 
primary focus of re2c is on generating *fast* code: it compiles 
regular expressions to deterministic finite automata and 
translates them into direct-coded lexers in the target language 
(such lexers are generally faster and easier to debug than their 
table-driven analogues). Secondary re2c focus is on 
*flexibility*: it does not assume a fixed program template; 
instead, it allows the user to embed lexers anywhere in the 
source code and configure them to avoid unnecessary buffering and 
bounds checks. Internal algorithm used by re2c is based on a 
special kind of deterministic finite automata: [lookahead 
TDFA](http://re2c.org/2022_borsotti_trofimovich_a_closer_look_at_tdfa.pdf).
These automata are as fast as ordinary DFA, but they are also capable of
performing submatch extraction with minimal overhead.

There is a [detailed user 
guide](http://re2c.org/manual/manual_d.html) an [online 
playground](http://re2c.org/playground/?example=d/01_basic.re) 
with many examples.
Nov 25
next sibling parent reply Sergey <kornburn yandex.ru> writes:
On Monday, 25 November 2024 at 16:01:54 UTC, Ulya wrote:
 a special kind of deterministic finite automata: [lookahead 
 TDFA](http://re2c.org/2022_borsotti_trofimovich_a_closer_look_at_tdfa.pdf).
These automata are as fast as ordinary DFA, but they are also capable of
performing submatch extraction with minimal overhead.

 There is a [detailed user 
 guide](http://re2c.org/manual/manual_d.html) an [online 
 playground](http://re2c.org/playground/?example=d/01_basic.re) 
 with many examples.
Hi Ulya. I don't have an account on LOR so glad you wrote here :) Based on some examples from the playground it seems re2c is inserting `#line` directives. I think it is not supported by D lang. I've checked for example 'reuse.re'
Nov 25
parent reply Ulya <skvadrik gmail.com> writes:
On Monday, 25 November 2024 at 19:18:40 UTC, Sergey wrote:
 On Monday, 25 November 2024 at 16:01:54 UTC, Ulya wrote:
 a special kind of deterministic finite automata: [lookahead 
 TDFA](http://re2c.org/2022_borsotti_trofimovich_a_closer_look_at_tdfa.pdf).
These automata are as fast as ordinary DFA, but they are also capable of
performing submatch extraction with minimal overhead.

 There is a [detailed user 
 guide](http://re2c.org/manual/manual_d.html) an [online 
 playground](http://re2c.org/playground/?example=d/01_basic.re) 
 with many examples.
Hi Ulya. I don't have an account on LOR so glad you wrote here :) Based on some examples from the playground it seems re2c is inserting `#line` directives. I think it is not supported by D lang. I've checked for example 'reuse.re'
Hi Sergey :) I believe `#line` directives are supported, as described here: https://dlang.org/spec/lex.html#special-token-sequence. All examples are compiled with `dmd -g -wi` and tested that they produce the expected output: https://github.com/skvadrik/re2c/blob/master/examples/d/__run_all.sh#L26. It is possible to disable line directives for an individual file using `-i`, or disable them globally with [this setting in syntax file](https://github.com/skvadrik/re2c/blob/master/include/syntax/d#L31).
Nov 25
parent Sergey <kornburn yandex.ru> writes:
On Monday, 25 November 2024 at 21:33:35 UTC, Ulya wrote:
 Hi Sergey :)

 I believe `#line` directives are supported, as described here: 
 https://dlang.org/spec/lex.html#special-token-sequence.
Oh cool. I didn't know that and it is kinda unexpected for me :) Thanks!
Nov 25
prev sibling parent Ulya <skvadrik gmail.com> writes:
On Monday, 25 November 2024 at 16:01:54 UTC, Ulya wrote:
 Regular expression compiler [re2c](http://re2c.org) now 
 [supports 
 D](http://re2c.org/releases/release_notes.html#release-4-0).

 [...]
BTW this is completely different from https://code.dlang.org/packages/re2d. The latter is bindings to re2 library, while re2c is an ahead of time regexp compiler (a port of a tool that existed since 1993). The name clash is unfortunate.
Nov 25