digitalmars.D.learn - Poor regex performance?
- Julian (46/46) Apr 04 2019 The following code, that just runs a regex against a large exim
- rikki cattermole (2/2) Apr 04 2019 If you need performance use ldc not dmd (assumed).
- Julian (6/8) Apr 04 2019 Thanks! I already had dmd installed from a brief look at D a long
- Stefan Koch (5/16) Apr 04 2019 You need to disable the GC.
- Jon Degenhardt (6/17) Apr 04 2019 Try:
- XavierAP (3/6) Apr 04 2019 Configuration variable "DFLAGS". On Windows you can specify it in
- H. S. Teoh (8/9) Apr 04 2019 [...]
The following code, that just runs a regex against a large exim log to report on top senders, is 140 times slower than similar C code using PCRE, when compiled with just -O. With a bunch of other flags I got it down to only 13x slower than C code that's using libc regcomp/regexec. import std.stdio, std.string, std.regex, std.array, std.algorithm; T min(T)(T a, T b) { if (a < b) return a; return b; } void main() { ulong[string] emailcounts; auto re = ctRegex!(r"(?:\S+ ){3,4}<= ([^ ]+ (\S+))"); foreach (line; File("exim_mainlog").byLine()) { auto m = line.match(re); if (m) { ++emailcounts[m.front[1].idup]; } } string[] senders = emailcounts.keys; sort!((a, b) { return emailcounts[a] > emailcounts[b]; })(senders); foreach (i; 0 .. min(senders.length, 5)) { writefln("%5s %s", emailcounts[senders[i]], senders[i]); } } Other code's available at https://github.com/jrfondren/topsender-bench I get D down to 1.2x slower with PCRE and getline() I wrote this part of the way through chapter 1 of "The D Programming Language", so my question is mainly: is this a fair result? std.regex is very slow and I should reach for PCRE if regex speed matters? Or is this code severely flawed somehow? I'm using a random production log; not trying to make things difficult. Relatedly, how can I add custom compiler flags to rdmd, in a D script? For example, -L-lpcre
Apr 04 2019
If you need performance use ldc not dmd (assumed). LLVM has many factors better code optimizes than dmd does.
Apr 04 2019
On Thursday, 4 April 2019 at 09:57:26 UTC, rikki cattermole wrote:If you need performance use ldc not dmd (assumed). LLVM has many factors better code optimizes than dmd does.Thanks! I already had dmd installed from a brief look at D a long time ago, so I missed the details at https://dlang.org/download.html ldc2 -O3 does a lot better, but the result is still 30x slower without PCRE.
Apr 04 2019
On Thursday, 4 April 2019 at 10:31:43 UTC, Julian wrote:On Thursday, 4 April 2019 at 09:57:26 UTC, rikki cattermole wrote:You need to disable the GC. by importing core.memory : GC; and calling GC.Disable(); the next thing is to avoid the .idup and cast to string instead.If you need performance use ldc not dmd (assumed). LLVM has many factors better code optimizes than dmd does.Thanks! I already had dmd installed from a brief look at D a long time ago, so I missed the details at https://dlang.org/download.html ldc2 -O3 does a lot better, but the result is still 30x slower without PCRE.
Apr 04 2019
On Thursday, 4 April 2019 at 10:31:43 UTC, Julian wrote:On Thursday, 4 April 2019 at 09:57:26 UTC, rikki cattermole wrote:Try: ldc2 -O3 -release -flto=thin -defaultlib=phobos2-ldc-lto,druntime-ldc-lto -enable-inlining This will improve inlining and optimization across the runtime library boundaries. This can help in certain types of code.If you need performance use ldc not dmd (assumed). LLVM has many factors better code optimizes than dmd does.Thanks! I already had dmd installed from a brief look at D a long time ago, so I missed the details at https://dlang.org/download.html ldc2 -O3 does a lot better, but the result is still 30x slower without PCRE.
Apr 04 2019
On Thursday, 4 April 2019 at 09:53:06 UTC, Julian wrote:Relatedly, how can I add custom compiler flags to rdmd, in a D script? For example, -L-lpcreConfiguration variable "DFLAGS". On Windows you can specify it in the sc.ini file. On Linux: https://dlang.org/dmd-linux.html
Apr 04 2019
On Thu, Apr 04, 2019 at 09:53:06AM +0000, Julian via Digitalmars-d-learn wrote: [...]auto re = ctRegex!(r"(?:\S+ ){3,4}<= ([^ ]+ (\S+))");[...] ctRegex is a crock; use regex() instead and it might actually work better. T -- Stop staring at me like that! It's offens... no, you'll hurt your eyes!
Apr 04 2019