digitalmars.D.learn - regex - match/matchAll and bmatch - different output
- Ivan Kazmenko (20/20) Dec 30 2015 Hi,
- Ivan Kazmenko (3/8) Dec 31 2015 Reported as https://issues.dlang.org/show_bug.cgi?id=15489.
- anonymous (11/24) Jan 01 2016 The `\1` there is a backreference. Backreferences are not part of
- Ivan Kazmenko (15/34) Jan 02 2016 The overview by the module author
Hi, While solving Advent of Code problems for fun (already discussed in the forum: http://forum.dlang.org/post/cwdkmblukzptsrsrvdkr forum.dlang.org), I ran into an issue. I wanted to test for the pattern "two consecutive characters, arbitrary sequence, the same two consecutive characters". Sadly, my solution using regular expressions gave a wrong result, but a hand-written one was accepted. The problem reduced to the following: import std.regex, std.stdio; void main () { writeln (bmatch ("abab", r"(..).*\1")); // [["abab", "ab"]] writeln (match ("abab", r"(..).*\1")); // [["abab", "ab"]] writeln (matchAll ("abab", r"(..).*\1")); // [["abab", "ab"]] writeln (bmatch ("xabab", r"(..).*\1")); // [["abab", "ab"]] writeln (match ("xabab", r"(..).*\1")); // [] writeln (matchAll ("xabab", r"(..).*\1")); // [] } As you can see, bmatch (usage discouraged in the docs) gives me the result I want, but match (also discouraged) and matchAll (way to go) don't. Am I misusing matchAll, or is this a bug? Ivan Kazmenko.
Dec 30 2015
On Wednesday, 30 December 2015 at 11:06:55 UTC, Ivan Kazmenko wrote:... As you can see, bmatch (usage discouraged in the docs) gives me the result I want, but match (also discouraged) and matchAll (way to go) don't. Am I misusing matchAll, or is this a bug?Reported as https://issues.dlang.org/show_bug.cgi?id=15489.
Dec 31 2015
On 30.12.2015 12:06, Ivan Kazmenko wrote:import std.regex, std.stdio; void main () { writeln (bmatch ("abab", r"(..).*\1")); // [["abab", "ab"]] writeln (match ("abab", r"(..).*\1")); // [["abab", "ab"]] writeln (matchAll ("abab", r"(..).*\1")); // [["abab", "ab"]] writeln (bmatch ("xabab", r"(..).*\1")); // [["abab", "ab"]] writeln (match ("xabab", r"(..).*\1")); // [] writeln (matchAll ("xabab", r"(..).*\1")); // [] } As you can see, bmatch (usage discouraged in the docs) gives me the result I want, but match (also discouraged) and matchAll (way to go) don't. Am I misusing matchAll, or is this a bug?The `\1` there is a backreference. Backreferences are not part of regular expressions, in the sense that they allow you to describe more than regular languages. [1] As far as I know, bmatch uses a widespread matching mechanism, while match/matchAll use a different, less common one. It wouldn't surprise me if match/matchAll simply didn't support backreferences. Backreferences are not documented, as far as I can see, but they're working in other patterns. So, yeah, this is possibly a bug. [1] https://en.wikipedia.org/wiki/Regular_expression#Patterns_for_non-regular_languages
Jan 01 2016
On Friday, 1 January 2016 at 12:29:01 UTC, anonymous wrote:On 30.12.2015 12:06, Ivan Kazmenko wrote:The overview by the module author (http://dlang.org/regular-expression.html) does mention in the last paragraph that backreferences are supported. Looks like it is a common feature in other programming languages, too. The "\1" part is working correctly when "abab" or "abxab" or "ababx" but not "abac". This means it is probably intended to work, and handling "xabab" incorrectly is a bug. Also, as I understand it from the docs, matchAll/matchFirst use the most appropriate of match/bmatch internally, so if match does not properly support the particular backreference but bmatch does, the bug is in using the incorrect one to handle a pattern. At any rate, wrong result with a 8-character pattern produces a "regex don't work" impression, and I hope something can be done about it.As you can see, bmatch (usage discouraged in the docs) gives me the result I want, but match (also discouraged) and matchAll (way to go) don't. Am I misusing matchAll, or is this a bug?The `\1` there is a backreference. Backreferences are not part of regular expressions, in the sense that they allow you to describe more than regular languages. [1] As far as I know, bmatch uses a widespread matching mechanism, while match/matchAll use a different, less common one. It wouldn't surprise me if match/matchAll simply didn't support backreferences. Backreferences are not documented, as far as I can see, but they're working in other patterns. So, yeah, this is possibly a bug. [1] https://en.wikipedia.org/wiki/Regular_expression#Patterns_for_non-regular_languages
Jan 02 2016