digitalmars.D.learn - Question about using regex
- James Oliphant (21/21) Mar 21 2012 While following the regex discussion, I have been compiling the examples...
- Dmitry Olshansky (21/42) Mar 21 2012 Mm-hm it means the fix to use size_t by default is in upstream, but not
- Dmitry Olshansky (5/51) Mar 21 2012 Oh wait, it's in this chapter :) I probably should make more noise about...
While following the regex discussion, I have been compiling the examples to help with my understanding of how it works. From Dmitry's example page: http://blackwhale.github.com/regular-expression.html and from the dlang.org website: http://dlang.org/phobos/std_regex.html std.regex.replace calls a delegate auto delegate(Captures!string) which does not compile. The definition in Phobos for Captures is struct Captures(R,DIndex) and for the purposes of these examples changing the delegate to auto delegate(Captures!(string,uint)) seems to work. Is this correct? In another example on Dmitry's page that starts: auto m = match("Ranges are hot!", r"(\w)\w*(\w)"); //at least 3 "word" symbols The output from the example is "Ranges, R, s", but I don't quite understand why those where the matches in this case. Also does the regular expression imply match at least 2 "word" symbols where \w* means match 0 or more "word" symbols? These newsgroups are a great resource, keep up the great work!
Mar 21 2012
On 21.03.2012 20:05, James Oliphant wrote:While following the regex discussion, I have been compiling the examples to help with my understanding of how it works. From Dmitry's example page: http://blackwhale.github.com/regular-expression.html and from the dlang.org website: http://dlang.org/phobos/std_regex.html std.regex.replace calls a delegate auto delegate(Captures!string) which does not compile. The definition in Phobos for Captures is struct Captures(R,DIndex) and for the purposes of these examples changing the delegate to auto delegate(Captures!(string,uint)) seems to work. Is this correct?Mm-hm it means the fix to use size_t by default is in upstream, but not in 2.058 I think. User needs not to specify index type, this is a hook for future extension.In another example on Dmitry's page that starts: auto m = match("Ranges are hot!", r"(\w)\w*(\w)"); //at least 3 "word" symbols The output from the example is "Ranges, R, s", but I don't quite understand why those where the matches in this case.Ok, \w matches any single word character, that is alpha, numeric or one of few other oddities*. Now (\w) captures 1 character into 1st _submatch_ ('R'). \w* captures the rest the gets reverted so that the next (\w) matches The second (\w) thus captures last char ('s') into 2nd _submatch_ captures lists submatches captured during one match, [0] is the whole match. I get it that people tend to think that I was about to show multiple _matches_ here, but that belongs to the next chapter. Here I was just showing how to work with submatches, that needs to be stressed somehow. *This is enormously useful tool to get info on unicode stuff and regex in particular http://unicode.org/cldr/utility/index.jsp Also does theregular expression imply match at least 2 "word" symbols where \w* means match 0 or more "word" symbols?Yup, that's right at least 2, I should correct wording.These newsgroups are a great resource, keep up the great work!You are welcome. -- Dmitry Olshansky
Mar 21 2012
On 21.03.2012 21:13, Dmitry Olshansky wrote:On 21.03.2012 20:05, James Oliphant wrote:Oh wait, it's in this chapter :) I probably should make more noise about "g" flag, and separate submatches from range of matches more cleanly.While following the regex discussion, I have been compiling the examples to help with my understanding of how it works. From Dmitry's example page: http://blackwhale.github.com/regular-expression.html and from the dlang.org website: http://dlang.org/phobos/std_regex.html std.regex.replace calls a delegate auto delegate(Captures!string) which does not compile. The definition in Phobos for Captures is struct Captures(R,DIndex) and for the purposes of these examples changing the delegate to auto delegate(Captures!(string,uint)) seems to work. Is this correct?Mm-hm it means the fix to use size_t by default is in upstream, but not in 2.058 I think. User needs not to specify index type, this is a hook for future extension.In another example on Dmitry's page that starts: auto m = match("Ranges are hot!", r"(\w)\w*(\w)"); //at least 3 "word" symbols The output from the example is "Ranges, R, s", but I don't quite understand why those where the matches in this case.Ok, \w matches any single word character, that is alpha, numeric or one of few other oddities*. Now (\w) captures 1 character into 1st _submatch_ ('R'). \w* captures the rest the gets reverted so that the next (\w) matches The second (\w) thus captures last char ('s') into 2nd _submatch_ captures lists submatches captured during one match, [0] is the whole match. I get it that people tend to think that I was about to show multiple _matches_ here, but that belongs to the next chapter. Here I was just showing how to work with submatches, that needs to be stressed somehow.*This is enormously useful tool to get info on unicode stuff and regex in particular http://unicode.org/cldr/utility/index.jsp Also does the-- Dmitry Olshanskyregular expression imply match at least 2 "word" symbols where \w* means match 0 or more "word" symbols?Yup, that's right at least 2, I should correct wording.These newsgroups are a great resource, keep up the great work!You are welcome.
Mar 21 2012