www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 13532] New: std.regex performance (enums; regex vs ctRegex)

https://issues.dlang.org/show_bug.cgi?id=13532

          Issue ID: 13532
           Summary: std.regex performance (enums; regex vs ctRegex)
           Product: D
           Version: D2
          Hardware: All
                OS: All
            Status: NEW
          Keywords: performance
          Severity: enhancement
          Priority: P5
         Component: Phobos
          Assignee: nobody puremagic.com
          Reporter: thecybershadow gmail.com

I noticed something strange after accidentally introducing a performance
regression in a program using std.regex. Benchmark program:

///////////////////////////////////////////
import std.algorithm;
import std.array;
import std.conv;
import std.datetime;
import std.file;
import std.regex;
import std.stdio;
import std.string;

enum expr = `;.*`;
enum repl = "";
enum fn = `alice30.txt`;
enum N = 5000;

string[] lines;

void regexInline()
{
    lines
        .map!(line => line
            .replaceAll(regex(expr), repl)
        )
        .array
    ;
}

void regexAuto()
{
    auto r = regex(expr);
    lines
        .map!(line => line
            .replaceAll(r, repl)
        )
        .array
    ;
}

void regexStatic()
{
    static r = regex(expr);
    lines
        .map!(line => line
            .replaceAll(r, repl)
        )
        .array
    ;
}

void regexEnum()
{
    enum r = regex(expr);
    lines
        .map!(line => line
            .replaceAll(r, repl)
        )
        .array
    ;
}

void ctRegexInline()
{
    lines
        .map!(line => line
            .replaceAll(ctRegex!expr, repl)
        )
        .array
    ;
}

void ctRegexAuto()
{
    auto r = ctRegex!expr;
    lines
        .map!(line => line
            .replaceAll(r, repl)
        )
        .array
    ;
}

void ctRegexStatic()
{
    static r = ctRegex!expr;
    lines
        .map!(line => line
            .replaceAll(r, repl)
        )
        .array
    ;
}

void ctRegexEnum()
{
    enum r = ctRegex!expr;
    lines
        .map!(line => line
            .replaceAll(r, repl)
        )
        .array
    ;
}

Regex!char re(string pattern)()
{
    static Regex!char r;
    if (r.empty)
        r = regex(pattern);
    return r;
}

void reInline()
{
    lines
        .map!(line => line
            .replaceAll(re!expr, repl)
        )
        .array
    ;
}

alias funcs = TypeTuple!(
    regexInline,
    regexAuto,
    regexStatic,
    regexEnum,
    ctRegexInline,
    ctRegexAuto,
    ctRegexStatic,
    ctRegexEnum,
    reInline,
);

void main()
{
    auto text = cast(string)read(fn);
    lines = text.splitLines();
    auto results = benchmark!funcs(N);
    foreach (i, func; funcs)
        writeln(
            __traits(identifier, func),
            "\t",
            to!Duration(results[i]),
        );
}
///////////////////////////////////////////

Here are my results:

regexInline     10 secs, 174 ms, 254 μs, and 2 hnsecs
regexAuto       8 secs, 249 ms, 92 μs, and 5 hnsecs
regexStatic     8 secs, 155 ms, 231 μs, and 1 hnsec
regexEnum       19 secs, 358 ms, 66 μs, and 8 hnsecs
ctRegexInline   21 secs, 399 ms, 346 μs, and 5 hnsecs
ctRegexAuto     10 secs, 57 ms, and 418 μs
ctRegexStatic   10 secs, 66 ms, 489 μs, and 9 hnsecs
ctRegexEnum     21 secs, 593 ms, 486 μs, and 9 hnsecs
reInline        8 secs, 430 ms, 852 μs, and 3 hnsecs

The first surprise for me was that declaring a regex object (either Regex or
StaticRegex) with "enum" was so much slower. It makes sense now that I think
about it: creating a struct literal inside a loop will be more expensive than
referencing one already residing somewhere in memory. Perhaps it might be worth
mentioning in the documentation to avoid using enum with compiled regexes.

The second surprise was that ctRegex was slower than regular regex, although
the difference is not significative.

I don't know whether this needs any action, feel free to WONTFIX.

--
Sep 25 2014