digitalmars.D.learn - Compile time regex matching
- Jason den Dulk (8/8) Jul 13 2014 Hi
- Philippe Sigaud via Digitalmars-d-learn (37/41) Jul 14 2014 You can try Pegged, a parser generator that works at compile-time
- Jason den Dulk (23/26) Jul 15 2014 I did, and I got it to work. Unfortunately, the code used to in
- Philippe Sigaud via Digitalmars-d-learn (15/21) Jul 15 2014 I never tried that, I'm happy that works.
- Artur Skawina via Digitalmars-d-learn (31/37) Jul 14 2014
- Philippe Sigaud via Digitalmars-d-learn (12/25) Jul 14 2014 Ah, static!
Hi I am trying to write some code that uses and matches to regular expressions at compile time, but the compiler won't let me because matchFirst and matchAll make use of malloc(). Is there an alternative that I can use that can be run at compile time? Thanks in advance. Jason
Jul 13 2014
I am trying to write some code that uses and matches to regular expressions at compile time, but the compiler won't let me because matchFirst and matchAll make use of malloc(). Is there an alternative that I can use that can be run at compile time?You can try Pegged, a parser generator that works at compile-time (both the generator and the generated parser). https://github.com/PhilippeSigaud/Pegged docs: https://github.com/PhilippeSigaud/Pegged/wiki/Pegged-Tutorial It's also on dub: http://code.dlang.org/packages/pegged It takes a grammar as input, not a single regular expression, but the syntax is not too different. import pegged.grammar; mixin(grammar(` MyRegex: foo <- "abc"* "def"? `)); void main() { enum result = MyRegex("abcabcdefFOOBAR"); // compile-time parsing // everything can be queried and tested at compile-time, if need be. static assert(result.matches == ["abc", "abc", "def"]); static assert(result.begin == 0); static assert(result.end == 9); pragma(msg, result.toString()); // parse tree } It probably does not implement all those regex nifty features, but it has all the usual Parsing Expression Grammars powers. It gives you an entire parse result, though: matches, children, subchildren, etc. As you can see, matches are accessible at the top level. One thing to keep in mind, that comes from the language and not this library: in the previous code, since 'result' is an enum, it'll be 'pasted' in place everytime it's used in code: all those static asserts get an entire copy of the parse tree. It's a bit wasteful, but using 'immutable' directly does not work here, but this is OK: enum res = MyRegex("abcabcdefFOOBAR"); // compile-time parsing immutable result = res; // to avoid copying the enum value everywhere The static asserts then works (not the toString, though). Maybe someone more knowledgeable than me on DMD internals could certify it indeed avoid re-allocating those parse results.
Jul 14 2014
On Monday, 14 July 2014 at 11:43:01 UTC, Philippe Sigaud via Digitalmars-d-learn wrote:You can try Pegged, a parser generator that works at compile-time (both the generator and the generated parser).I did, and I got it to work. Unfortunately, the code used to in the CTFE is left in the final executable even though it is not used at runtime. So now the question is, is there away to get rid of the excess baggage? BTW Here is the code I am playing with. import std.stdio; string get_match() { import pegged.grammar; mixin(grammar(` MyRegex: foo <- "abc"* "def"? `)); auto result = MyRegex(import("config-file.txt")); // compile-time parsing return "writeln(\""~result.matches[0]~"\");"; } void main() { mixin(get_match()); }
Jul 15 2014
I did, and I got it to work. Unfortunately, the code used to in the CTFE is left in the final executable even though it is not used at runtime. So now the question is, is there away to get rid of the excess baggage?Not that I know of. Once code is injected, it's compiled into the executable.auto result = MyRegex(import("config-file.txt")); // compile-time parsing return "writeln(\""~result.matches[0]~"\");";mixin(get_match());I never tried that, I'm happy that works. Another solution would be to push these actions at runtime, by using a small script instead of your compilation command. This script can be in D. - The script takes a file name as input - Open the file - Use regex to parse it - Extract the values you want and write them to a temporary file. - Invoke the compiler (with std.process) on your main file with -Jpath flag to the temporary file. Inside your real code, you can thus use mixin(import("temp file")) happily. - Delete the temporary file once the previous step is finished. Compile the script once and for all, it should execute quite rapidly. It's a unusual pre-processor, in a way.
Jul 15 2014
On 07/14/14 13:42, Philippe Sigaud via Digitalmars-d-learn wrote:asserts get an entire copy of the parse tree. It's a bit wasteful, but using 'immutable' directly does not work here, but this is OK: enum res = MyRegex("abcabcdefFOOBAR"); // compile-time parsing immutable result = res; // to avoid copying the enum value everywherestatic immutable result = MyRegex("abcabcdefFOOBAR"); // compile-time parsingThe static asserts then works (not the toString, though). Maybediff --git a/pegged/peg.d b/pegged/peg.d index 98959294c40e..307e8a14b1dd 100644 --- a/pegged/peg.d +++ b/pegged/peg.d -55,7 +55,7 struct ParseTree /** Basic toString for easy pretty-printing. */ - string toString(string tabs = "") + string toString(string tabs = "") const { string result = name; -262,7 +262,7 Position position(string s) /** Same as previous overload, but from the begin of P.input to p.end */ -Position position(ParseTree p) +Position position(const ParseTree p) { return position(p.input[0..p.end]); } [completely untested; just did a git clone and fixed the two errors the compiler was whining about. Hmm, did pegged get faster? Last time i tried (years ago) it was unusably slow; right now, compiling your example, i didn't notice the extra multi-second delay that was there then.] artur
Jul 14 2014
On Mon, Jul 14, 2014 at 3:19 PM, Artur Skawina via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> wrote:On 07/14/14 13:42, Philippe Sigaud via Digitalmars-d-learn wrote:Ah, static!asserts get an entire copy of the parse tree. It's a bit wasteful, but using 'immutable' directly does not work here, but this is OK: enum res = MyRegex("abcabcdefFOOBAR"); // compile-time parsing immutable result = res; // to avoid copying the enum value everywherestatic immutable result = MyRegex("abcabcdefFOOBAR"); // compile-time parsing(snip diff) I'll push that to the repo, thanks! I should sprinkle some const and pure everywhere...The static asserts then works (not the toString, though). Maybe[completely untested; just did a git clone and fixed the two errors the compiler was whining about. Hmm, did pegged get faster? Last time i tried (years ago) it was unusably slow; right now, compiling your example, i didn't notice the extra multi-second delay that was there then.]It's still slower than some handcrafted parsers. At some time, I could get it on par with std.regex (between 1.3 and 1.8 times slower), but that meant losing some other properties. I have other parsing engines partially implemented, with either a larger specter of grammars or better speed (but not both!). I hope the coming holidays will let me go back to it.
Jul 14 2014