digitalmars.D - Re: Is str ~ regex the root of all evil, or the leaf of all good?
- bearophile <bearophileHUGS lycos.com> Feb 19 2009
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Feb 19 2009
- Derek Parnell <derek psych.ward> Feb 19 2009
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Feb 19 2009
- bearophile <bearophileHUGS lycos.com> Feb 19 2009
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Feb 19 2009
- bearophile <bearophileHUGS lycos.com> Feb 19 2009
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Feb 19 2009
- "jovo" <jovo at.home> Feb 19 2009
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Feb 19 2009
- "jovo" <jovo at.home> Feb 19 2009
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Feb 19 2009
- KennyTM~ <kennytm gmail.com> Feb 19 2009
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Feb 19 2009
- Bill Baxter <wbaxter gmail.com> Feb 19 2009
- "Denis Koroskin" <2korden gmail.com> Feb 19 2009
- Bill Baxter <wbaxter gmail.com> Feb 19 2009
- "Denis Koroskin" <2korden gmail.com> Feb 19 2009
Andrei Alexandrescu:I think the "g", "i", and "m" flags are popular enough if you've done any amount of regex
I think I don't like the "g". ----------------------- To test an API it's often good to try to use it or compare it against similar practical&common operations done with another language or library. So here I show two examples in Python. You can try to translate such two operations with the std.re of D2 to see how they become :-) The first example shows the usage of a callable for re.sub() (in D it may be called replace()). Here replacer() is a user-defined function given to re.sub()/matchobj.sub() that they call on each match. Note that in Python functions are objects, so I have dynamically added to the replacer() function an instance attribute named "counter". In D (and Python) you can do the same thing creating a small class with counter attribute. import re def replacer(mobj): replacer.counter += 1 return "REPL%02d" % replacer.counter replacer.counter = 0 s1 = ".......TAG............TAG................TAG..........TAG....." result = ".......REPL01............REPL02................REPL03..........REPL04..." r = re.sub("TAG", replacer, s1) assert r == result ---------- This is a little example of managing groups in Python:import re data = ">hello1 how are5 you?<" patt = re.compile(r".*?(hello\d).*?(are\d).*") patt.match(data).groups()
(notes that here all groups are found eagerly. If you want a lazy matching in Python you have to use re.finditer() or matchobj.finditer()). I may like a syntax similar to this, where opIndex() allows to find the matched group:patt.match(data)[0]
patt.match(data)[1]
Bye, bearophile
Feb 19 2009
bearophile wrote:Andrei Alexandrescu:I think the "g", "i", and "m" flags are popular enough if you've done any amount of regex
I think I don't like the "g".
How can anyone think they don't like something? You like it or not, but it's not the result of a thought process. I guess. Anyway: g is from Perl. Let's keep it that way. Andrei
Feb 19 2009
On Thu, 19 Feb 2009 07:51:46 -0800, Andrei Alexandrescu wrote:bearophile wrote:Andrei Alexandrescu:I think the "g", "i", and "m" flags are popular enough if you've done any amount of regex
I think I don't like the "g".
How can anyone think they don't like something? You like it or not, but it's not the result of a thought process. I guess.
It is not a question of whether one likes or doesn't like; this expression is attempting to say something about one's level of certainty about liking something. That is to say, one might not be positive if they *know* if they like something or not, therefore they *think* (suspect, but not have definitive evidence) of their stance.Anyway: g is from Perl. Let's keep it that way.
Perfect justification ;-) -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
Feb 19 2009
Derek Parnell wrote:On Thu, 19 Feb 2009 07:51:46 -0800, Andrei Alexandrescu wrote:bearophile wrote:Andrei Alexandrescu:I think the "g", "i", and "m" flags are popular enough if you've done any amount of regex
I think I don't like the "g".
it's not the result of a thought process. I guess.
It is not a question of whether one likes or doesn't like; this expression is attempting to say something about one's level of certainty about liking something. That is to say, one might not be positive if they *know* if they like something or not, therefore they *think* (suspect, but not have definitive evidence) of their stance.
I see. Me, I always use "think" to evoke an actual thinking process. Otherwise I use "feel" or "believe". (This turns out to be important in various interpersonal interactions, e.g. do you want to drive the conversation towards thoughts or feelings? Guess which is gonna get you a date :o).) So by definition I can't think I like something. But I understand how some may use "I think" as a synonym for "Without being sure, to me it seems". Andrei
Feb 19 2009
bearophile wrote:Andrei Alexandrescu:I think the "g", "i", and "m" flags are popular enough if you've done any amount of regex
I think I don't like the "g". ----------------------- To test an API it's often good to try to use it or compare it against similar practical&common operations done with another language or library. So here I show two examples in Python. You can try to translate such two operations with the std.re of D2 to see how they become :-) The first example shows the usage of a callable for re.sub() (in D it may be called replace()). Here replacer() is a user-defined function given to re.sub()/matchobj.sub() that they call on each match. Note that in Python functions are objects, so I have dynamically added to the replacer() function an instance attribute named "counter". In D (and Python) you can do the same thing creating a small class with counter attribute. import re def replacer(mobj): replacer.counter += 1 return "REPL%02d" % replacer.counter replacer.counter = 0 s1 = ".......TAG............TAG................TAG..........TAG....." result = ".......REPL01............REPL02................REPL03..........REPL04..." r = re.sub("TAG", replacer, s1) assert r == result ----------
Excellent idea. Let's see: uint counter; string replacer(string) { return format("REPL%02d", counter++); } auto s1 = ".......TAG............TAG................TAG..........TAG....."; auto result = ".......REPL01............REPL02................REPL03..........REPL04..."; r = replace!(replacer)(s1, "TAG"); assert(r == result);This is a little example of managing groups in Python:import re data = ">hello1 how are5 you?<" patt = re.compile(r".*?(hello\d).*?(are\d).*") patt.match(data).groups()
auto data = ">hello1 how are5 you?<"; auto iter = match(data, regex(r".*?(hello\d).*?(are\d).*")); foreach (i; 0 .. iter.engine.captures) writeln(iter.capture[i]);(notes that here all groups are found eagerly. If you want a lazy matching in Python you have to use re.finditer() or matchobj.finditer()). I may like a syntax similar to this, where opIndex() allows to find the matched group:patt.match(data)[0]
patt.match(data)[1]
No go due to confusions with random-access ranges. Andrei
Feb 19 2009
Andrei Alexandrescu:Excellent idea. Let's see:<
Thank you for all your work and the will to answer the posts here. Some usable API is slowly shaping up :-)uint counter; string replacer(string) { return format("REPL%02d", counter++); } auto s1 = ".......TAG............TAG................TAG..........TAG....."; auto result = ".......REPL01............REPL02................REPL03..........REPL04..."; r = replace!(replacer)(s1, "TAG"); assert(r == result);
It looks good enough. With a static variable it may become: string replacer(string) { static int counter; return format("REPL%02d", counter++); } With small struct/class it may become: struct Replacer { int counter; string opCall(string s) { this.counter++; return format("REPL%02d", counter); } } -------------------auto data = ">hello1 how are5 you?<"; auto iter = match(data, regex(r".*?(hello\d).*?(are\d).*")); foreach (i; 0 .. iter.engine.captures) writeln(iter.capture[i]);
I don't understand that. What's the purpose of ".engine"? "captures" may be better named "ngroups" or "ncaptures", or you may just use the .len/.length attribute in some way. foreach (i, group; iter.groups) writeln(i " ", group); "group" may be a struct that defines toString and can be cast to string, and also keeps the starting position of the group into the original string. Bye, bearophile
Feb 19 2009
bearophile wrote:Andrei Alexandrescu:auto data = ">hello1 how are5 you?<"; auto iter = match(data, regex(r".*?(hello\d).*?(are\d).*")); foreach (i; 0 .. iter.engine.captures) writeln(iter.capture[i]);
I don't understand that. What's the purpose of ".engine"?
It's the regex engine that has generated the match. I coded that wrong in two different ways, it should have been: foreach (i; 0 .. iter.captures) writeln(iter.capture(i));"captures" may be better named "ngroups" or "ncaptures", or you may just use the .len/.length attribute in some way.
"Capture" is the traditional term as far as I understand. I can't use .length because it messes up with range semantics. "len" would be too confusing. "ncaptures" is too cute. Nobody's perfect :o).foreach (i, group; iter.groups) writeln(i " ", group); "group" may be a struct that defines toString and can be cast to string, and also keeps the starting position of the group into the original string.
That sounds good. Andrei
Feb 19 2009
Andrei Alexandrescu:foreach (i; 0 .. iter.captures) writeln(iter.capture(i));
"Capture" is the traditional term as far as I understand. I can't use .length because it messes up with range semantics. "len" would be too confusing. "ncaptures" is too cute. Nobody's perfect :o).
"group" may be a struct that defines toString and can be cast to string, and also keeps the starting position of the group into the original string.
That sounds good.
Well, then match() may return just a dynamic array of such groups/captures. So such array has both .length and opIndex. It looks simple :-) Bye, bearophile
Feb 19 2009
bearophile wrote:Andrei Alexandrescu:foreach (i; 0 .. iter.captures) writeln(iter.capture(i));
"Capture" is the traditional term as far as I understand. I can't use .length because it messes up with range semantics. "len" would be too confusing. "ncaptures" is too cute. Nobody's perfect :o).
"group" may be a struct that defines toString and can be cast to string, and also keeps the starting position of the group into the original string.
That sounds good.
Well, then match() may return just a dynamic array of such groups/captures. So such array has both .length and opIndex. It looks simple :-)
Looks simple but it isn't. How do you advance to the next match? foreach (m; "abracadabra".match("(.)a", "g")) writeln(m.capture[0]); This should print: r c d r There's need to make progress in the matching, not in the capture. How do you distinguish among them? Andrei
Feb 19 2009
"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message news:gnk8te$cgl$1 digitalmars.com...Looks simple but it isn't. How do you advance to the next match? foreach (m; "abracadabra".match("(.)a", "g")) writeln(m.capture[0]); This should print: r c d r There's need to make progress in the matching, not in the capture. How do you distinguish among them? Andrei
foreach(capture; match(s, r)) foreach(group; capture) writeln(group);
Feb 19 2009
jovo wrote:"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message news:gnk8te$cgl$1 digitalmars.com...Looks simple but it isn't. How do you advance to the next match? foreach (m; "abracadabra".match("(.)a", "g")) writeln(m.capture[0]); This should print: r c d r There's need to make progress in the matching, not in the capture. How do you distinguish among them? Andrei
foreach(capture; match(s, r)) foreach(group; capture) writeln(group);
The consecrated terminology is: foreach(match; match(s, r)) foreach(capture; match) writeln(capture); "Group" is a group defined without an intent to capture. A "capture" is a group that also binds to the state of the match. Anyhow... this can be done but things get a tad more confusing for other uses. How about this: foreach(match; match(s, r)) foreach(capture; match.captures) writeln(capture); ? Andrei
Feb 19 2009
"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message news:gnkc24$hul$1 digitalmars.com...The consecrated terminology is: foreach(match; match(s, r)) foreach(capture; match) writeln(capture); "Group" is a group defined without an intent to capture. A "capture" is a group that also binds to the state of the match. Anyhow... this can be done but things get a tad more confusing for other uses. How about this: foreach(match; match(s, r)) foreach(capture; match.captures) writeln(capture); ? Andrei
I think you must answer this question more generally, same for all library. May be both?
Feb 19 2009
jovo wrote:"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message news:gnkc24$hul$1 digitalmars.com...The consecrated terminology is: foreach(match; match(s, r)) foreach(capture; match) writeln(capture); "Group" is a group defined without an intent to capture. A "capture" is a group that also binds to the state of the match. Anyhow... this can be done but things get a tad more confusing for other uses. How about this: foreach(match; match(s, r)) foreach(capture; match.captures) writeln(capture); ? Andrei
I think you must answer this question more generally, same for all library. May be both?
I'd hate to fall again into the fallacy of trying to appease everyone's taste. Really std.regexp has set a negative record with the incredible array of names: find, search, exec, match, test, and probably I forgot a couple. Also it has offered a variety of random features in both free-function and member-function format, not even always doing the same thing. Germans have a saying: "Kurtz und gut". Let's make it short and good. Andrei
Feb 19 2009
Bill Baxter wrote:I don't like the syntax I saw somewhere earlier in the thread of 0..iter.captures .captures looks like it should be a set of captures, not a count. This is a need that comes up again and again -- querying the size, or count, or length of some sub-element like this -- so I think it would greatly benefit Phobos to choose some less ambiguous convention and stick to it. Like nCaptures, numCaptures, capturesLength, etc. ---bb
iter.count
Feb 19 2009
Denis Koroskin wrote:On Thu, 19 Feb 2009 23:23:13 +0300, Bill Baxter <wbaxter gmail.com> wrote:I don't like the syntax I saw somewhere earlier in the thread of 0..iter.captures .captures looks like it should be a set of captures, not a count. This is a need that comes up again and again -- querying the size, or count, or length of some sub-element like this -- so I think it would greatly benefit Phobos to choose some less ambiguous convention and stick to it. Like nCaptures, numCaptures, capturesLength, etc. ---bb
Agree. I thought that iter.captures is a set (range) of captures.
I'm done implementing that. Andrei
Feb 19 2009
On Fri, Feb 20, 2009 at 9:47 AM, KennyTM~ <kennytm gmail.com> wrote:Bill Baxter wrote:I don't like the syntax I saw somewhere earlier in the thread of 0..iter.captures .captures looks like it should be a set of captures, not a count. This is a need that comes up again and again -- querying the size, or count, or length of some sub-element like this -- so I think it would greatly benefit Phobos to choose some less ambiguous convention and stick to it. Like nCaptures, numCaptures, capturesLength, etc. ---bb
iter.count
Maybe I haven't paid close enough attention here, but I think the reason he didn't say .count or .length is that it's ambiguous whether it means the number of captures or the number of matches. --bb
Feb 19 2009
On Thu, 19 Feb 2009 19:00:41 +0300, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote: [snip]This is a little example of managing groups in Python:import re data = ">hello1 how are5 you?<" patt = re.compile(r".*?(hello\d).*?(are\d).*") patt.match(data).groups()
auto data = ">hello1 how are5 you?<"; auto iter = match(data, regex(r".*?(hello\d).*?(are\d).*")); foreach (i; 0 .. iter.engine.captures) writeln(iter.capture[i]);
I would expect that to be foreach (/*Capture */ i; 0 .. iter.engine.captures) writeln(i);(notes that here all groups are found eagerly. If you want a lazy matching in Python you have to use re.finditer() or matchobj.finditer()). I may like a syntax similar to this, where opIndex() allows to find the matched group:patt.match(data)[0]
patt.match(data)[1]
No go due to confusions with random-access ranges.
Why iter.capture[0] and iter.capture[1] aren't good enough? How are they different from iter.engine.captures[0] and iter.engine.captures[1]? Why it is a no go if you access iter.captures as a random-access range? I'm sorry if these are dumb questions, but the code you've shown is a bit confusing (these iter.engine.captures and iter.captures).Andrei
Feb 19 2009
Denis Koroskin wrote:On Thu, 19 Feb 2009 19:00:41 +0300, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote: [snip]This is a little example of managing groups in Python:import re data = ">hello1 how are5 you?<" patt = re.compile(r".*?(hello\d).*?(are\d).*") patt.match(data).groups()
auto data = ">hello1 how are5 you?<"; auto iter = match(data, regex(r".*?(hello\d).*?(are\d).*")); foreach (i; 0 .. iter.engine.captures) writeln(iter.capture[i]);
I would expect that to be foreach (/*Capture */ i; 0 .. iter.engine.captures) writeln(i);(notes that here all groups are found eagerly. If you want a lazy matching in Python you have to use re.finditer() or matchobj.finditer()). I may like a syntax similar to this, where opIndex() allows to find the matched group:patt.match(data)[0]
patt.match(data)[1]
No go due to confusions with random-access ranges.
Why iter.capture[0] and iter.capture[1] aren't good enough? How are they different from iter.engine.captures[0] and iter.engine.captures[1]? Why it is a no go if you access iter.captures as a random-access range? I'm sorry if these are dumb questions, but the code you've shown is a bit confusing (these iter.engine.captures and iter.captures).
They're good. The code I posted was dumb. The "engine" thing does not belong there, and "captures" should be indeed a random-access range. Andrei
Feb 19 2009
I don't like the syntax I saw somewhere earlier in the thread of 0..iter.captures .captures looks like it should be a set of captures, not a count. This is a need that comes up again and again -- querying the size, or count, or length of some sub-element like this -- so I think it would greatly benefit Phobos to choose some less ambiguous convention and stick to it. Like nCaptures, numCaptures, capturesLength, etc. ---bb
Feb 19 2009
On Thu, 19 Feb 2009 23:23:13 +0300, Bill Baxter <wbaxter gmail.com> wrote:I don't like the syntax I saw somewhere earlier in the thread of 0..iter.captures .captures looks like it should be a set of captures, not a count. This is a need that comes up again and again -- querying the size, or count, or length of some sub-element like this -- so I think it would greatly benefit Phobos to choose some less ambiguous convention and stick to it. Like nCaptures, numCaptures, capturesLength, etc. ---bb
Agree. I thought that iter.captures is a set (range) of captures.
Feb 19 2009