digitalmars.D.learn - Very Stupid Regex question
- seany (18/18) Aug 07 2014 Cosider please the following:
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (8/26) Aug 07 2014 It's not clear to me what exactly you want, but:
- Justin Whear (10/14) Aug 07 2014 You're not really using regexes properly. You want to greedily match as...
- seany (4/21) Aug 07 2014 thing is, abcd is read from a file, and in the compile time, i
- H. S. Teoh via Digitalmars-d-learn (18/41) Aug 07 2014 So basically you have a file containing regex patterns, and you want to
- Justin Whear (7/15) Aug 07 2014 (patterns);
- H. S. Teoh via Digitalmars-d-learn (10/28) Aug 07 2014 Hmm, you're right. I was a bit disappointed to find out that the |
- H. S. Teoh via Digitalmars-d-learn (8/15) Aug 07 2014 https://issues.dlang.org/show_bug.cgi?id=13268
- seany (3/5) Aug 07 2014 Thank you soooooooooo much!!
Cosider please the following: string s1 = PREabcdPOST; string s2 = PREabPOST; string[] srar = ["ab", "abcd"]; // this can not be constructed with a particular order foreach(sr; srar) { auto r = regex(sr; "g"); auto m = matchFirst(s1, r); break; // this one matches ab // but I want this to match abcd // and for s2 I want to match ab } obviously there are ways like counting the match length, and then using the maximum length, instead of breaking as soon as a match is found. Are there any other better ways?
Aug 07 2014
On Thursday, 7 August 2014 at 16:05:17 UTC, seany wrote:Cosider please the following: string s1 = PREabcdPOST; string s2 = PREabPOST; string[] srar = ["ab", "abcd"]; // this can not be constructed with a particular order foreach(sr; srar) { auto r = regex(sr; "g"); auto m = matchFirst(s1, r); break; // this one matches ab // but I want this to match abcd // and for s2 I want to match ab } obviously there are ways like counting the match length, and then using the maximum length, instead of breaking as soon as a match is found. Are there any other better ways?It's not clear to me what exactly you want, but: Are the regexes in `srar` related? That is, does one regex always include the previous one as a prefix? Then you can use optional matches: /ab(cd)?/ This will match "abcd" if it is there, but will also match "ab" otherwise.
Aug 07 2014
On Thu, 07 Aug 2014 16:05:16 +0000, seany wrote:obviously there are ways like counting the match length, and then using the maximum length, instead of breaking as soon as a match is found. Are there any other better ways?You're not really using regexes properly. You want to greedily match as much as possible in this case, e.g.: void main() { import std.regex; auto re = regex("ab(cd)?"); assert("PREabcdPOST".matchFirst(re).hit == "abcd"); assert("PREabPOST".matchFirst(re).hit == "ab"); }
Aug 07 2014
On Thursday, 7 August 2014 at 16:12:59 UTC, Justin Whear wrote:On Thu, 07 Aug 2014 16:05:16 +0000, seany wrote:thing is, abcd is read from a file, and in the compile time, i dont know if cd may at all be there or not, ir if it should be ab(ef)obviously there are ways like counting the match length, and then using the maximum length, instead of breaking as soon as a match is found. Are there any other better ways?You're not really using regexes properly. You want to greedily match as much as possible in this case, e.g.: void main() { import std.regex; auto re = regex("ab(cd)?"); assert("PREabcdPOST".matchFirst(re).hit == "abcd"); assert("PREabPOST".matchFirst(re).hit == "ab"); }
Aug 07 2014
On Thu, Aug 07, 2014 at 04:49:05PM +0000, seany via Digitalmars-d-learn wrote:On Thursday, 7 August 2014 at 16:12:59 UTC, Justin Whear wrote:So basically you have a file containing regex patterns, and you want to find the longest match among them? One way to do this is to combine them at runtime: string[] patterns = ... /* read from file, etc. */; // Longer patterns match first patterns.sort!((a,b) => a.length > b.length); // Build regex string regexStr = "%((%(%c%))%||%)".format(patterns); auto re = regex(regexStr); ... // Run matches against input char[] input = ...; auto m = input.match(re); auto matchedString = m.captures[0]; T -- When solving a problem, take care that you do not become part of the problem.On Thu, 07 Aug 2014 16:05:16 +0000, seany wrote:thing is, abcd is read from a file, and in the compile time, i dont know if cd may at all be there or not, ir if it should be ab(ef)obviously there are ways like counting the match length, and then using the maximum length, instead of breaking as soon as a match is found. Are there any other better ways?You're not really using regexes properly. You want to greedily match as much as possible in this case, e.g.: void main() { import std.regex; auto re = regex("ab(cd)?"); assert("PREabcdPOST".matchFirst(re).hit == "abcd"); assert("PREabPOST".matchFirst(re).hit == "ab"); }
Aug 07 2014
On Thu, 07 Aug 2014 10:22:37 -0700, H. S. Teoh via Digitalmars-d-learn wrote:So basically you have a file containing regex patterns, and you want to find the longest match among them?// Longer patterns match first patterns.sort!((a,b) => a.length > b.length); // Build regex string regexStr = "%((%(%c%))%||%)".format(patterns);auto re = regex(regexStr);This only works if the patterns are simple literals. E.g. the pattern 'a +' might match a longer sequence than 'aaa'. If you're out for the longest possible match, iteratively testing each pattern is probably the way to go.
Aug 07 2014
On Thu, Aug 07, 2014 at 05:33:42PM +0000, Justin Whear via Digitalmars-d-learn wrote:On Thu, 07 Aug 2014 10:22:37 -0700, H. S. Teoh via Digitalmars-d-learn wrote:Hmm, you're right. I was a bit disappointed to find out that the | operator in std.regex (and also in Perl's regex) doesn't do longest-match but first-match. :-( I had always thought it did longest-match, like in lex/flex. I wish we can extend std.regex to allow longest-match for alternations... but there may be performance consequences. T -- There's light at the end of the tunnel. It's the oncoming train.So basically you have a file containing regex patterns, and you want to find the longest match among them?// Longer patterns match first patterns.sort!((a,b) => a.length > b.length); // Build regex string regexStr = "%((%(%c%))%||%)".format(patterns);auto re = regex(regexStr);This only works if the patterns are simple literals. E.g. the pattern 'a +' might match a longer sequence than 'aaa'. If you're out for the longest possible match, iteratively testing each pattern is probably the way to go.
Aug 07 2014
On Thu, Aug 07, 2014 at 10:42:13AM -0700, H. S. Teoh via Digitalmars-d-learn wrote: [...]Hmm, you're right. I was a bit disappointed to find out that the | operator in std.regex (and also in Perl's regex) doesn't do longest-match but first-match. :-( I had always thought it did longest-match, like in lex/flex. I wish we can extend std.regex to allow longest-match for alternations... but there may be performance consequences.https://issues.dlang.org/show_bug.cgi?id=13268 T -- Valentine's Day: an occasion for florists to reach into the wallets of nominal lovers in dire need of being reminded to profess their hypothetical love for their long-forgotten.
Aug 07 2014
On Thursday, 7 August 2014 at 18:16:11 UTC, H. S. Teoh via Digitalmars-d-learn wrote:https://issues.dlang.org/show_bug.cgi?id=13268 TThank you soooooooooo much!!
Aug 07 2014