digitalmars.D.bugs - [Issue 5169] New: Add(?:) Non-capturing parentheses group support to std.regex
- d-bugmail puremagic.com (64/64) Nov 05 2010 http://d.puremagic.com/issues/show_bug.cgi?id=5169
- d-bugmail puremagic.com (7/7) Nov 05 2010 http://d.puremagic.com/issues/show_bug.cgi?id=5169
- d-bugmail puremagic.com (12/12) Feb 25 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5169
- d-bugmail puremagic.com (7/7) Mar 01 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5169
- d-bugmail puremagic.com (11/11) Jun 06 2011 http://d.puremagic.com/issues/show_bug.cgi?id=5169
http://d.puremagic.com/issues/show_bug.cgi?id=5169 Summary: Add(?:) Non-capturing parentheses group support to std.regex Product: D Version: D2 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Phobos AssignedTo: nobody puremagic.com ReportedBy: dmitry.olsh gmail.com 09:35:15 PDT --- Intro: Non-capturing parentheses group the regex so you can apply regex operators, but do not capture anything and do not create backreferences. Examples: //A very dumb example, matches abcabcabc, no backrefs created (?:abc){3} //A decent attempt to snatch href field of <a> html tag, without unnessary //backrefs: <(?:a|A)(?:[^<>]*)href *= *"?([^"<> ]*)"?(?:[^<>]*)> Rationale: ECMA262 standart mentioned on http://www.digitalmars.com/d/2.0/phobos/std_regex.html requires support of such construct. Sooner or later we should get rid of "however, some of the very advanced forms may behave slightly differently", also given the fact that sometimes it's simple. See attached patch. Backtracking is also costly, see benchmark code/results (uses the proposed patch): //===bench.d=== import std.regex, std.stdio,std.datetime; void main(){ auto r1 = regex(`(?:a|A)(?:[^<>]*)href *= *"?([^"<> ]*)"?(?:[^<>]*)>`,"g"); auto r2 = regex(`(a|A)([^<>]*)href *= *"?([^"<> ]*)"?([^<>]*)>`,"g"); void nobackref(){ match(`<a href = http://www.google.com id="G"/>`,r1).hit; } void backref(){ match(`<a href = http://www.google.com id="G"/>`,r2).hit; } auto bench = benchmark!(nobackref,backref)(1_000); writeln("No backref: ",bench[0].milliseconds); writeln("With backref: ",bench[1].milliseconds); } //====== Results on my machine, min .. max of 10 No backref: 256.955 .. 267.341 With backref: 580.636 .. 587.187 P.S. I have rebuilt phobos (on Windows), and run unitestes, output: C:\dmd2\src\phobos>unittest.exe --- std.socket(660) broken test --- (std.socket.HostException: Address family mismatch) 9abc5a5a12345678 args.length = 1 args[0] = 'C:\dmd2\src\phobos\unittest.exe~T' Vendor string: AuthenticAMD Processor string: AMD Phenom(tm) II X4 940 Processor Signature: Family=16 Model=4 Stepping=2 Features: MMX FXSR SSE SSE2 SSE3 3DNow! 3DNow!+ MMX+ AMD64 HTT Multithreading: 4 threads / 4 cores Success! -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 05 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5169 09:36:40 PDT --- Created an attachment (id=800) Patch for regex.d, enables (?:) regex syntax as per ECMA262 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Nov 05 2010
http://d.puremagic.com/issues/show_bug.cgi?id=5169 Jerry Quinn <jlquinn optonline.net> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |jlquinn optonline.net Severity|enhancement |normal --- Changing from enhancement to a bug, as std.regex is supposed to support ECMA-262. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 25 2011
http://d.puremagic.com/issues/show_bug.cgi?id=5169 07:52:03 PST --- For a more full feature request for regex and a patch for it: http://d.puremagic.com/issues/show_bug.cgi?id=5673 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 01 2011
http://d.puremagic.com/issues/show_bug.cgi?id=5169 Dmitry Olshansky <dmitry.olsh gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |DUPLICATE 02:27:19 PDT --- *** This issue has been marked as a duplicate of issue 5673 *** -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 06 2011