digitalmars.D - non-utf8-decoding regex (for speed)?
- Timothee Cour via Digitalmars-d (10/10) Apr 05 2016 Is there a way to avoid decoding (as utf8) when calling regex' apis?
- Dmitry Olshansky (7/17) Apr 06 2016 The speed gain for ASCII only vs Unicode with ASCII special case would
Is there a way to avoid decoding (as utf8) when calling regex' apis? or a plan to do so? use case: speed (no decoding) and avoiding throwing on invalid utf8 sequences ideally this should allow: --- auto s = cast(ubyte[]) "abcd"; //potentially not valid utf8 sequence auto r = cast(ubyte[]) `^\d`; auto m=match(s, r.regex); // right now: regex cannot deduce function from argument types !()(ubyte[]) ---
Apr 05 2016
On 06-Apr-2016 01:00, Timothee Cour via Digitalmars-d wrote:Is there a way to avoid decoding (as utf8) when calling regex' apis? or a plan to do so?Custom alphabets - yes, including ASCII.use case: speed (no decoding) and avoiding throwing on invalid utf8 sequencesThe speed gain for ASCII only vs Unicode with ASCII special case would be around 0.5% (the time spent on decoding) as my extensive profiling shows. Latest pull for std.regex did exactly that - special path for ASCII.ideally this should allow: --- auto s = cast(ubyte[]) "abcd"; //potentially not valid utf8 sequence auto r = cast(ubyte[]) `^\d`; auto m=match(s, r.regex); // right now: regex cannot deduce function from argument types !()(ubyte[]) ----- Dmitry Olshansky
Apr 06 2016