digitalmars.D.bugs - [Issue 3136] New: Incorrect and strange behavior of std.regexp.RegExp if using a pattern with optional prefix and suffix longer than 1 char
- d-bugmail puremagic.com (220/220) Jul 04 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3136
- d-bugmail puremagic.com (52/52) Jul 08 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3136
- d-bugmail puremagic.com (10/10) Oct 11 2009 http://d.puremagic.com/issues/show_bug.cgi?id=3136
- d-bugmail puremagic.com (10/10) Jun 05 2011 http://d.puremagic.com/issues/show_bug.cgi?id=3136
- d-bugmail puremagic.com (12/12) Jun 06 2011 http://d.puremagic.com/issues/show_bug.cgi?id=3136
http://d.puremagic.com/issues/show_bug.cgi?id=3136 Summary: Incorrect and strange behavior of std.regexp.RegExp if using a pattern with optional prefix and suffix longer than 1 char Product: D Version: 2.030 Platform: x86 OS/Version: Windows Status: NEW Severity: major Priority: P2 Component: Phobos AssignedTo: nobody puremagic.com ReportedBy: marcellognani gmail.com It seems like std.regexp.RegExp get confused if I try using a pattern with optional prefix and suffix longer than 1 char. An expression of the form ([A]{0,2})(C)([D]{0,2}) matches all off "AC", "BC", "CD", "CE", "ACD", "BCE", "ABCDE", "C" (as expected). An expression of the form ([AB]{0,2})(C)([DE]{0,2}) or ([AB]?[AB]?)(C)([DE]?[DE]?) fails (incorrectly and unexpectedly) in some of the cases above (both "CD" and "CE", for example). Here the code: --- import std.regexp; import std.stdio; public { static void main() { RegExp eTest; void SetExp(string pattern) { eTest=new RegExp(pattern,"g"); std.stdio.writeln("Testing expression ",pattern); } void TryString(string s) { std.stdio.writeln("Trying on string\"",s,"\":"); auto captures=eTest.exec(s); if(captures.length) { std.stdio.writeln("Success!"); foreach(uint i,string capture;captures) std.stdio.writeln(i,"): \"",capture,"\""); } else { std.stdio.writeln("Failure!"); } } SetExp(r"([A]{0,2})(C)([D]{0,2})"); TryString("AC"); TryString("BC"); TryString("CD"); TryString("CE"); TryString("ACD"); TryString("BCE"); TryString("ABCDE"); TryString("C"); TryString("F"); SetExp(r"([AB]{0,2})(C)([DE]{0,2})"); TryString("AC"); TryString("BC"); TryString("CD"); TryString("CE"); TryString("ACD"); TryString("BCE"); TryString("ABCDE"); TryString("C"); TryString("F"); SetExp(r"([AB]?[AB]?)(C)([DE]?[DE]?)"); TryString("AC"); TryString("BC"); TryString("CD"); TryString("CE"); TryString("ACD"); TryString("BCE"); TryString("ABCDE"); TryString("C"); TryString("F"); } } --- Here the output: --- Testing expression ([A]{0,2})(C)([D]{0,2}) Trying on string"AC": Success! 0): "AC" 1): "A" 2): "C" 3): "" Trying on string"BC": Success! 0): "C" 1): "" 2): "C" 3): "" Trying on string"CD": Success! 0): "CD" 1): "" 2): "C" 3): "D" Trying on string"CE": Success! 0): "C" 1): "" 2): "C" 3): "" Trying on string"ACD": Success! 0): "ACD" 1): "A" 2): "C" 3): "D" Trying on string"BCE": Success! 0): "C" 1): "" 2): "C" 3): "" Trying on string"ABCDE": Success! 0): "CD" 1): "" 2): "C" 3): "D" Trying on string"C": Success! 0): "C" 1): "" 2): "C" 3): "" Trying on string"F": Failure! Testing expression ([AB]{0,2})(C)([DE]{0,2}) Trying on string"AC": Success! 0): "AC" 1): "A" 2): "C" 3): "" Trying on string"BC": Success! 0): "BC" 1): "B" 2): "C" 3): "" Trying on string"CD": Failure! Trying on string"CE": Failure! Trying on string"ACD": Success! 0): "ACD" 1): "A" 2): "C" 3): "D" Trying on string"BCE": Success! 0): "BCE" 1): "B" 2): "C" 3): "E" Trying on string"ABCDE": Success! 0): "ABCDE" 1): "AB" 2): "C" 3): "DE" Trying on string"C": Failure! Trying on string"F": Failure! Testing expression ([AB]?[AB]?)(C)([DE]?[DE]?) Trying on string"AC": Success! 0): "AC" 1): "A" 2): "C" 3): "" Trying on string"BC": Success! 0): "BC" 1): "B" 2): "C" 3): "" Trying on string"CD": Failure! Trying on string"CE": Failure! Trying on string"ACD": Success! 0): "ACD" 1): "A" 2): "C" 3): "D" Trying on string"BCE": Success! 0): "BCE" 1): "B" 2): "C" 3): "E" Trying on string"ABCDE": Success! 0): "ABCDE" 1): "AB" 2): "C" 3): "DE" Trying on string"C": Failure! Trying on string"F": Failure! --- Kind regards, Marcello Gnani -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 04 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3136 12:06:26 PDT --- I had the time to investigate further; the problem is related to an incorrect optimization performed by Phobos on the optional prefix. The constructor code of the RegExp object calls "public void compile(string pattern, string attributes)", that builds a correct internal RegExp program; then, an optimization is tried calling the "void optimize()" function. In this function, during the optimization of the REbit opcode (the opcode that implements the prefix match when the prefix is of more than one letter), the optionality of the prefix is lost, leading to the incorrect behavior reported. The simplest patch I came up is to modify slightly the "int starrchars(Range r, const(ubyte)[] prog)" function (that is called by "optimize") as follows: . . . case REnm: case REnmq: // len, n, m, () len = (cast(uint *)&prog[i + 1])[0]; n = (cast(uint *)&prog[i + 1])[1]; m = (cast(uint *)&prog[i + 1])[2]; pop = &prog[i + 1 + uint.sizeof * 3]; if (!starrchars(r, pop[0 .. len])) return 0; if (n) return 1; i += 1 + uint.sizeof * 3 + len; break; . . . should return 0 if the n operand of the REnm opcode is 0 (this changes the line before the break statement); this avoids the insertion of the optionality-killing first filter: . . . case REnm: case REnmq: // len, n, m, () len = (cast(uint *)&prog[i + 1])[0]; n = (cast(uint *)&prog[i + 1])[1]; m = (cast(uint *)&prog[i + 1])[2]; pop = &prog[i + 1 + uint.sizeof * 3]; if (!starrchars(r, pop[0 .. len])) return 0; if (n) return 1; return 0; break; . . . I tried it and it works now. Maybe this solves some other regexp bug yet open. Best regards, Marcello Gnani -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 08 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3136 Andrei Alexandrescu <andrei metalanguage.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED CC| |andrei metalanguage.com AssignedTo|nobody puremagic.com |andrei metalanguage.com -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Oct 11 2009
http://d.puremagic.com/issues/show_bug.cgi?id=3136 Andrei Alexandrescu <andrei metalanguage.com> changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|andrei metalanguage.com |dmitry.olsh gmail.com 08:11:26 PDT --- Reassigning to Dmitry. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 05 2011
http://d.puremagic.com/issues/show_bug.cgi?id=3136 Dmitry Olshansky <dmitry.olsh gmail.com> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED 08:03:48 PDT --- Fixed for std.regex https://github.com/D-Programming-Language/phobos/commit/9afb00e36b625322d7f1d8ec0fbd876c2b5c03fc -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 06 2011