www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 7260] New: "g" on default in std.regex.match

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7260

           Summary: "g" on default in std.regex.match
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: bearophile_hugs eml.cc



D2 code:


import std.stdio: write, writeln;
import std.regex: regex, match;

void main() {
    string text = "abc312de";

    foreach (c; text.match("1|2|3|4"))
        write(c, " ");
    writeln();

    foreach (c; text.match(regex("1|2|3|4", "g")))
        write(c, " ");
    writeln();
}


It outputs (DMD 2.058 Head):

["3"] 
["3"] ["1"] ["2"] 


In my code I have seen that usually the "g" option (that means "repeat over the
whole input") is what I want.

So what do you think about making "g" the default?



Note: I have not marked this issue as "enhancement" because of this comment by
Dmitry Olshansky (found by drey_ on IRC #D):

http://dfeed.kimsufi.thecybershadow.net/discussion/thread/jc9hrl$2lpp$1 digitalmars.com#post-jc9mag:2430tq:241:40digitalmars.com

 Yet I have to issue yet another warning about new std.regex compared 
 with old one:
 
 import std.stdio;
 import std.regex;
 
 void main() {
     string src = "4.5.1";
     foreach (c; match(src, regex(r"(\d+)")))
         writeln(c.hit);
 }
 
 previously this will find all matches, now it finds only first one. To 
 get all of matches use "g" option.
 
 Seems like 100% compatibility was next to impossible.
-- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jan 09 2012
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7260


Dmitry Olshansky <dmitry.olsh gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dmitry.olsh gmail.com



12:21:44 PST ---
I dunno how to "fix" this bug. "g" by default imples there is a way to override
it. regex("blah","") ?
Leaving it as is now breaks old codebases that rely on "g" (though there should
be more of legacy std.regexp code out there).
Making it "g" on affects old code only inside foreach and generic constructs
that show all matches or iterate on them, it's rare but non-zero.

Another way would be to ditch current API, which I is not ideal btw ;)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 24 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7260





 I dunno how to "fix" this bug. "g" by default imples there is a way to override
 it. regex("blah","") ?
 Leaving it as is now breaks old codebases that rely on "g" (though there should
 be more of legacy std.regexp code out there).
 Making it "g" on affects old code only inside foreach and generic constructs
 that show all matches or iterate on them, it's rare but non-zero.
 
 Another way would be to ditch current API, which I is not ideal btw ;)
Fully ditching the currently used API is probably too much. A possible idea: regex("blah") <<== repeat over the whole input. regex("blah","") <<== repeat over the whole input. regex("blah","g") <<== repeat over the whole input. regex("blah","d") <<== doesn't repeat over the whole input. So far you have done good work on the regular expression implementation, so I trust your work. Thank you. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 24 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7260


SomeDude <lovelydear mailmetrash.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |lovelydear mailmetrash.com
           Severity|normal                      |enhancement


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Apr 19 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7260


bearophile_hugs eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|enhancement                 |normal



This is not an enhancement request (I consider it more like a little Phobos
regression).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Apr 19 2012
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7260


bearophile_hugs eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|"g" on default in           |"g" on default in std.regex
                   |std.regex.match             |



If changing std.regex.regex is not possible, then an alternative solution is to
introduce the new little function "std.regex.re", that repeats on default, that
is like:

re(someString) === regex(someString, "g")

re(someString, "d") === regex(someString, "dg")

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jan 24 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7260




12:22:46 PST ---

 If changing std.regex.regex is not possible, then an alternative solution is to
 introduce the new little function "std.regex.re", that repeats on default, that
 is like:
 
 re(someString) === regex(someString, "g")
 
 re(someString, "d") === regex(someString, "dg")
Frankly this is stupid (sorry). Obviously the wrong turn is that people (rightfully so) associate "find all" vs "find first" with operation that is "match"/"replace" not the "regex" as in the pattern itself. Personally I think that we better go with explicit overrides on "match"/"replace"/etc. and very slowly deprecate the "g" switch. Then how the override will look like is up for debate. match(someString, pattern).all //range of all matches match(someString, pattern).first //only the first one match(someString, pattern) // using the "g" flag to decide Or pass the override as optional parameter to match: match(someString, pattern, Regex.all); match(someString, pattern, Regex.first); match(someString, pattern); //use the flag I'll probably open a poll to pick the better one. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jan 25 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7260




10:43:30 PDT ---

 If changing std.regex.regex is not possible, then an alternative solution is to
 introduce the new little function "std.regex.re", that repeats on default, that
 is like:
 
 re(someString) === regex(someString, "g")
 
 re(someString, "d") === regex(someString, "dg")
Here is a plan based on one of my previous idea that I think is clean enough, given the circumstances and the fact that e.g. this Perl-ism is fairly popular in certain circles. (Namely attaching mode of operation to the pattern itself as in /`pattern`/`mode-suffix`). What we do is at first specify that "g" serves only as the intended default "mode" of this pattern. Then introduce simple and elegant way to explicitly specify what mode of matching to use: first, all or the default for this pattern. The your code looks like this (I'm still pondering better names/ways for overriding default): void main() { string text = "abc312de"; foreach (c; text.match("1|2|3|4").first) write(c, " "); writeln(); foreach (c; text.match(regex("1|2|3|4")).all) //could use string pattern as above write(c, " "); writeln(); } Then I'd try to do the same with replace. No overrides used would imply "use whatever the default mode is". How does it sound? Then we place nice bold warning that use of "g" option is discouraged and is provided only for compatibilty and is going be deprecated in future. A year later and depending on the mood of people it gets finally deprecated and slowly shifted towards oblivion. I'll probably cross-post this to NG to collect opinions since this is the largest pain point of the otherwise fine interface. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 10 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7260






 match(someString, pattern).all //range of all matches
 match(someString, pattern).first //only the first one
 match(someString, pattern) // using the "g" flag to decide
 No overrides used would imply "use whatever the default mode is". 
 
 How does it sound? 
 
 Then we place nice bold warning that use of "g" option is discouraged and is
 provided only for compatibilty and is going be deprecated in future.
 
 A year later and depending on the mood of people it gets finally deprecated and
 slowly shifted towards oblivion.
Once "g" is deprecated what is match(someString, pattern) (without all and first) doing? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 10 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7260




11:54:55 PDT ---


 
 match(someString, pattern).all //range of all matches
 match(someString, pattern).first //only the first one
 match(someString, pattern) // using the "g" flag to decide
 No overrides used would imply "use whatever the default mode is". 
 
 How does it sound? 
 
 Then we place nice bold warning that use of "g" option is discouraged and is
 provided only for compatibilty and is going be deprecated in future.
 
 A year later and depending on the mood of people it gets finally deprecated and
 slowly shifted towards oblivion.
Once "g" is deprecated what is match(someString, pattern) (without all and first) doing?
Could go both ways. The other posibility I just thought about is: match(...).first - is the same as current match(...).front i.e. simplify interface for the case when 1 match is needed match(...).all - the same as current match(... with "g" overrided) i.e. a range Then once "g" is off we could either make .all a nop. Alternative is to make it opaque object that has 2 methods only .first/.all. The third alternative to add alias this to make .first implicit. I feel it won't work reliably with range-based templates as it would make it "2 ranges in one". So only the first 2 are viable. I'd go with 1st that gets upgraded to the second once people forget about "g" switch entierly. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 10 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7260




12:38:45 PDT ---


 
 Then once "g" is off we could either make .all a nop.
 
 Alternative is to make it opaque object that has 2 methods only .first/.all.
 
 The third alternative to add alias this to make .first implicit. I feel it
 won't work reliably with range-based templates as it would make it "2 ranges in
 one".
 
 So only the first 2 are viable. I'd go with 1st that gets upgraded to the
 second once people forget about "g" switch entierly.
Typo - I've meant make it an opaque object then sometime later turn .all implicitly. It would still have potential to break code so it seems that just make .all implicit is better. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Mar 10 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7260




05:33:49 PDT ---
The problem now should is addressed by this pull
https://github.com/D-Programming-Language/phobos/pull/1470

There is matchAll/matchFirst calls now that are the prefered way to go about
matching. Currently they simply override global flag if present.
Returning to the original example:

foreach (c; text.matchAll("1|2|3|4")) //this spins over captures of each match
        write(c, " ");
writeln();

foreach (c; text.matchFirst("1|2|3|4")) //this spins submatches of 1st match
        write(c, " ");
writeln();

To me there is little else to do aside from slooowly deprecating old flag-based
match/replace interface.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 17 2013
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=7260


Dmitry Olshansky <dmitry.olsh gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |WONTFIX



01:12:35 PDT ---
Flags are to be gone one day and "g" by default is not going to happen.
This IMHO makes it won't fix.

Anyhow the core issue should now be addressed by using new API that is more
clear.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 22 2013