digitalmars.D.learn - How to avoid ctRegex (solved)
- cy (66/66) Aug 21 2016 At seconds PER (character range) pattern, ctRegex slows down
- ag0aep6g (3/10) Aug 21 2016 I may be missing the point here, but just putting `auto pattern =
- cy (4/6) Aug 22 2016 Really? I thought global variables could only be initialized with
- ag0aep6g (3/9) Aug 22 2016 That's true, and apparently `regex("foobar")` can be evaluated at
- cy (2/4) Aug 23 2016 Then what's ctRegex in there for at all...?
- ag0aep6g (8/9) Aug 23 2016 Optimization.
- Seb (4/15) Aug 24 2016 Yep, that's why ctRegex is 2x faster than the highly-tuned grep,
- cy (10/15) Aug 27 2016 It's not using it with a compile time constant that struck me as
- Dicebot (5/14) Aug 27 2016 But actual value of that Regex struct is perfectly known during
- David Nadlinger (7/11) Aug 27 2016 Yes, regex() is CTFEable, but this still comes at a significant
- ag0aep6g (8/13) Aug 27 2016 No, that's not right. The initializer for a module level variable has to...
At seconds PER (character range) pattern, ctRegex slows down compilation like crazy, but it's not obvious how to avoid using it, since Regex(Char) is kind of weird for a type. So, here's what I do. I think this is right. in the module scope, you start with: auto pattern = ctRegex!"foobar"; and you substitute with: typeof(regex("")) pattern; static this() { pattern = regex("foobar"); } That way you don't have to worry about whether to use a Regex!char, or a Regex!dchar, or a Regex!ubyte. It gives you the same functionality, at the cost a few microseconds slowdown on running your program. And once you're done debugging, you can always switch back, so... string defineRegex(string name, string pattern)() { import std.string: replace; return q{ debug { pragma(msg, "fast $name"); import std.regex: regex; typeof(regex("")) $name; static this() { $name = regex(`$pattern`); } } else { pragma(msg, "slooow $name"); import std.regex: ctRegex; auto $name = ctRegex!`$pattern`; } }.replace("$pattern",pattern) .replace("$name",name); } mixin(defineRegex!("naword",r"[\W]+")); mixin(defineRegex!("alnum",r"[a-zA-Z]+")); mixin(defineRegex!("pattern","foo([a-z]*?)bar")); mixin(defineRegex!("pattern2","foobar([^0-9z]+)")); void main() { } /* $ time rdmd -release /tmp/derp.d slooow naword slooow alnum slooow pattern slooow pattern2 slooow naword slooow alnum slooow pattern slooow pattern2 rdmd -release /tmp/derp.d 17.57s user 1.57s system 82% cpu 23.210 total $ time rdmd -debug /tmp/derp.d fast naword fast alnum fast pattern fast pattern2 fast naword fast alnum fast pattern fast pattern2 rdmd -debug /tmp/derp.d 2.92s user 0.37s system 71% cpu 4.623 total */ ...sure would be nice if you could cache precompiled regular expressions as files.
Aug 21 2016
On 08/21/2016 10:06 PM, cy wrote:in the module scope, you start with: auto pattern = ctRegex!"foobar"; and you substitute with: typeof(regex("")) pattern; static this() { pattern = regex("foobar"); }I may be missing the point here, but just putting `auto pattern = regex("foobar");` at module level works for me.
Aug 21 2016
On Sunday, 21 August 2016 at 21:18:11 UTC, ag0aep6g wrote:I may be missing the point here, but just putting `auto pattern = regex("foobar");` at module level works for me.Really? I thought global variables could only be initialized with static stuff available during compile time, and you needed a "static this() {}" block to initialize them otherwise.
Aug 22 2016
On 08/23/2016 06:06 AM, cy wrote:On Sunday, 21 August 2016 at 21:18:11 UTC, ag0aep6g wrote:That's true, and apparently `regex("foobar")` can be evaluated at compile time.I may be missing the point here, but just putting `auto pattern = regex("foobar");` at module level works for me.Really? I thought global variables could only be initialized with static stuff available during compile time, and you needed a "static this() {}" block to initialize them otherwise.
Aug 22 2016
On Tuesday, 23 August 2016 at 04:51:19 UTC, ag0aep6g wrote:That's true, and apparently `regex("foobar")` can be evaluated at compile time.Then what's ctRegex in there for at all...?
Aug 23 2016
On 08/24/2016 03:07 AM, cy wrote:Then what's ctRegex in there for at all...?Optimization. ctRegex requires that the pattern is available as a compile time constant. It uses that property to "generate optimized native machine code". The plain regex function doesn't have such a requirement. It also works with a pattern that's generated at run time, e.g. from user input. But you can use it with a compile time constant, too. And it works in CTFE then, but it does not "generate optimized native machine code".
Aug 23 2016
On Wednesday, 24 August 2016 at 05:29:57 UTC, ag0aep6g wrote:On 08/24/2016 03:07 AM, cy wrote:Yep, that's why ctRegex is 2x faster than the highly-tuned grep, e.g. https://github.com/dlang/phobos/pull/4286Then what's ctRegex in there for at all...?Optimization. ctRegex requires that the pattern is available as a compile time constant. It uses that property to "generate optimized native machine code". The plain regex function doesn't have such a requirement. It also works with a pattern that's generated at run time, e.g. from user input. But you can use it with a compile time constant, too. And it works in CTFE then, but it does not "generate optimized native machine code".
Aug 24 2016
On Wednesday, 24 August 2016 at 05:29:57 UTC, ag0aep6g wrote:The plain regex function doesn't have such a requirement. It also works with a pattern that's generated at run time, e.g. from user input. But you can use it with a compile time constant, too. And it works in CTFE then, but it does not "generate optimized native machine code".It's not using it with a compile time constant that struck me as weird. It's using it to assign a global variable that struck me as weird. When I saw `auto a = b;` at the module level, I thought that b had to be something you could evaluate at compile time. But I guess it can be a runtime calculated value, acting like it was assigned in a a static this() clause, and the requirement for it to be compile time generated is only for immutable? like `immutable auto a = b`?
Aug 27 2016
On Saturday, 27 August 2016 at 17:35:04 UTC, cy wrote:On Wednesday, 24 August 2016 at 05:29:57 UTC, ag0aep6g wrote:But actual value of that Regex struct is perfectly known during compile time. Thus it is possible and fine to use it as initializer. You can use any struct or class as initializer if it can be computed during compile-time.The plain regex function doesn't have such a requirement. It also works with a pattern that's generated at run time, e.g. from user input. But you can use it with a compile time constant, too. And it works in CTFE then, but it does not "generate optimized native machine code".It's not using it with a compile time constant that struck me as weird. It's using it to assign a global variable that struck me as weird.
Aug 27 2016
On Saturday, 27 August 2016 at 17:47:33 UTC, Dicebot wrote:But actual value of that Regex struct is perfectly known during compile time. Thus it is possible and fine to use it as initializer. You can use any struct or class as initializer if it can be computed during compile-time.Yes, regex() is CTFEable, but this still comes at a significant compile-time cost as the constructor does quite a bit of string manipulation, etc. I've seen this, i.e. inconsiderate use of regex() globals, cost tens of seconds in build time for bigger codebases. — David
Aug 27 2016
On 08/27/2016 07:35 PM, cy wrote:When I saw `auto a = b;` at the module level, I thought that b had to be something you could evaluate at compile time.That's right.But I guess it can be a runtime calculated value, acting like it was assigned in a a static this() clause,No, that's not right. The initializer for a module level variable has to be a compile-time constant. If the initializer is a function call, the compiler attempts to evaluate it at compile time. We have an acronym for that: CTFE = Compile Time Function Evaluation. `regex("foobar")` can be evaluated that way, so it can be used as an initializer for a module level variable.
Aug 27 2016