www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Poor regex performance?

reply Julian <julian.fondren gmail.com> writes:
The following code, that just runs a regex against a large exim 
log
to report on top senders, is 140 times slower than similar C code 
using
PCRE, when compiled with just -O. With a bunch of other flags I 
got it
down to only 13x slower than C code that's using libc 
regcomp/regexec.

   import std.stdio, std.string, std.regex, std.array, 
std.algorithm;

   T min(T)(T a, T b) {
           if (a < b) return a;
           return b;
   }

   void main() {
           ulong[string] emailcounts;
           auto re = ctRegex!(r"(?:\S+ ){3,4}<= ([^ ]+ (\S+))");

           foreach (line; File("exim_mainlog").byLine()) {
                   auto m = line.match(re);
                   if (m) {
                           ++emailcounts[m.front[1].idup];
                   }
           }

           string[] senders = emailcounts.keys;
           sort!((a, b) { return emailcounts[a] > emailcounts[b]; 
})(senders);
           foreach (i; 0 .. min(senders.length, 5)) {
                   writefln("%5s %s", emailcounts[senders[i]], 
senders[i]);
           }
   }

Other code's available at 
https://github.com/jrfondren/topsender-bench
I get D down to 1.2x slower with PCRE and getline()

I wrote this part of the way through chapter 1 of "The D 
Programming Language",
so my question is mainly: is this a fair result? std.regex is 
very slow and
I should reach for PCRE if regex speed matters? Or is this code 
severely
flawed somehow? I'm using a random production log; not trying to 
make things
difficult.

Relatedly, how can I add custom compiler flags to rdmd, in a D 
script?
For example, -L-lpcre
Apr 04 2019
next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
If you need performance use ldc not dmd (assumed).

LLVM has many factors better code optimizes than dmd does.
Apr 04 2019
parent reply Julian <julian.fondren gmail.com> writes:
On Thursday, 4 April 2019 at 09:57:26 UTC, rikki cattermole wrote:
 If you need performance use ldc not dmd (assumed).

 LLVM has many factors better code optimizes than dmd does.
Thanks! I already had dmd installed from a brief look at D a long time ago, so I missed the details at https://dlang.org/download.html ldc2 -O3 does a lot better, but the result is still 30x slower without PCRE.
Apr 04 2019
next sibling parent Stefan Koch <uplink.coder googlemail.com> writes:
On Thursday, 4 April 2019 at 10:31:43 UTC, Julian wrote:
 On Thursday, 4 April 2019 at 09:57:26 UTC, rikki cattermole 
 wrote:
 If you need performance use ldc not dmd (assumed).

 LLVM has many factors better code optimizes than dmd does.
Thanks! I already had dmd installed from a brief look at D a long time ago, so I missed the details at https://dlang.org/download.html ldc2 -O3 does a lot better, but the result is still 30x slower without PCRE.
You need to disable the GC. by importing core.memory : GC; and calling GC.Disable(); the next thing is to avoid the .idup and cast to string instead.
Apr 04 2019
prev sibling parent Jon Degenhardt <jond noreply.com> writes:
On Thursday, 4 April 2019 at 10:31:43 UTC, Julian wrote:
 On Thursday, 4 April 2019 at 09:57:26 UTC, rikki cattermole 
 wrote:
 If you need performance use ldc not dmd (assumed).

 LLVM has many factors better code optimizes than dmd does.
Thanks! I already had dmd installed from a brief look at D a long time ago, so I missed the details at https://dlang.org/download.html ldc2 -O3 does a lot better, but the result is still 30x slower without PCRE.
Try: ldc2 -O3 -release -flto=thin -defaultlib=phobos2-ldc-lto,druntime-ldc-lto -enable-inlining This will improve inlining and optimization across the runtime library boundaries. This can help in certain types of code.
Apr 04 2019
prev sibling next sibling parent XavierAP <n3minis-git yahoo.es> writes:
On Thursday, 4 April 2019 at 09:53:06 UTC, Julian wrote:
 Relatedly, how can I add custom compiler flags to rdmd, in a D 
 script?
 For example, -L-lpcre
Configuration variable "DFLAGS". On Windows you can specify it in the sc.ini file. On Linux: https://dlang.org/dmd-linux.html
Apr 04 2019
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Apr 04, 2019 at 09:53:06AM +0000, Julian via Digitalmars-d-learn wrote:
[...]
           auto re = ctRegex!(r"(?:\S+ ){3,4}<= ([^ ]+ (\S+))");
[...] ctRegex is a crock; use regex() instead and it might actually work better. T -- Stop staring at me like that! It's offens... no, you'll hurt your eyes!
Apr 04 2019