digitalmars.D.learn - regex with literal (ie automatically replace '(' with '\(', etc) )
- Timothee Cour (18/18) May 29 2013 See below:
- timotheecour (21/47) May 29 2013 something like this, which we should have in std.regex:
- Diggory (5/25) May 29 2013 That would be good (although you missed a few :P)
- Timothee Cour (12/41) May 29 2013 ok, here it is:
- Diggory (5/21) May 30 2013 According to this:
- Timothee Cour (9/32) May 30 2013 can use the same escape sequences for both (\c -> c in the replacement
- Dmitry Olshansky (8/40) May 30 2013 Indeed replace format string is a different beast. I can't recall if I
- Diggory (2/9) May 30 2013 Either the doc or the code should probably be changed then so
- Dmitry Olshansky (14/54) May 30 2013 Yes, please. It's was a blind spot for long time. Strictly speaking I
See below: import std.stdio; import std.regex; void main(){ "h(i".replace!(a=>a.hit~a.hit)(regex(`h\(`,"g")).writeln; //this works, but I need to specify the escape manually // "h(i".replace!(a=>a.hit~a.hit)(regex(`h(`,"gl")).writeln; //I'd like this to work with a flag, say 'l' (lowercase L) as in 'litteral'. } note, std.array.replace doesn't work because I want to be able to use std.regex' replace with delegate functionality as above. This is especially useful when the regex's first argument is given as an input argument (ie is unknown), and we want to properly escape it. Alternatively, (and perhaps more generally), could we have a function: string toRegexLiteral(string){ //replace all regex special characters (like '(' ) with their escaped equivalent }
May 29 2013
something like this, which we should have in std.regex: string escapeRegex(string a){ import std.string; enum transTable = ['[' : `\[`, '|' : `\|`, '*': `\*`, '+': `\+`, '?': `\?`, '(': `\(`, ')': `\)`]; return translate(a, transTable); } string escapeRegexReplace(string a){ import std.string; // enum transTable = ['$' : `$$`, '\\' : `\\`]; enum transTable = ['$' : `$$`]; return translate(a, transTable); } unittest{ string a=`asdf(def[ghi]+*|)`; assert(match(a,regex(escapeRegex(a))).hit==a); auto s=replace(a,regex(escapeRegex(a)),escapeRegexReplace(b)); assert(s==b); } On Wednesday, 29 May 2013 at 23:28:19 UTC, Timothee Cour wrote:See below: import std.stdio; import std.regex; void main(){ "h(i".replace!(a=>a.hit~a.hit)(regex(`h\(`,"g")).writeln; //this works, but I need to specify the escape manually // "h(i".replace!(a=>a.hit~a.hit)(regex(`h(`,"gl")).writeln; //I'd like this to work with a flag, say 'l' (lowercase L) as in 'litteral'. } note, std.array.replace doesn't work because I want to be able to use std.regex' replace with delegate functionality as above. This is especially useful when the regex's first argument is given as an input argument (ie is unknown), and we want to properly escape it. Alternatively, (and perhaps more generally), could we have a function: string toRegexLiteral(string){ //replace all regex special characters (like '(' ) with their escaped equivalent }
May 29 2013
On Wednesday, 29 May 2013 at 23:33:30 UTC, timotheecour wrote:something like this, which we should have in std.regex: string escapeRegex(string a){ import std.string; enum transTable = ['[' : `\[`, '|' : `\|`, '*': `\*`, '+': `\+`, '?': `\?`, '(': `\(`, ')': `\)`]; return translate(a, transTable); } string escapeRegexReplace(string a){ import std.string; // enum transTable = ['$' : `$$`, '\\' : `\\`]; enum transTable = ['$' : `$$`]; return translate(a, transTable); } unittest{ string a=`asdf(def[ghi]+*|)`; assert(match(a,regex(escapeRegex(a))).hit==a); auto s=replace(a,regex(escapeRegex(a)),escapeRegexReplace(b)); assert(s==b); }That would be good (although you missed a few :P) Technically any working "escapeRegex" would also function as a valid "escapeRegexReplace", although it might be slightly faster to have a specialised one.
May 29 2013
ok, here it is: https://github.com/timotheecour/dtools/blob/master/dtools/util/util.d#L78 simplified implementation and added missing escape symbols. Any symbol missing? I was basing myself based on http://dlang.org/phobos/std_regex.html, table entry '\c where c is one of', but that was incomplete. I'm also noting that table entry 'any character except' is also incomplete.Technically any working "escapeRegex" would also function as a valid"escapeRegexReplace", although it might be slightly faster to have a specialised one. not sure, because they escape differently (\$ vs $$). shall i do a pull request for std.regex? On Wed, May 29, 2013 at 8:32 PM, Diggory <diggsey googlemail.com> wrote:On Wednesday, 29 May 2013 at 23:33:30 UTC, timotheecour wrote:something like this, which we should have in std.regex: string escapeRegex(string a){ import std.string; enum transTable = ['[' : `\[`, '|' : `\|`, '*': `\*`, '+': `\+`, '?': `\?`, '(': `\(`, ')': `\)`]; return translate(a, transTable); } string escapeRegexReplace(string a){ import std.string; // enum transTable = ['$' : `$$`, '\\' : `\\`]; enum transTable = ['$' : `$$`]; return translate(a, transTable); } unittest{ string a=`asdf(def[ghi]+*|)`; assert(match(a,regex(**escapeRegex(a))).hit==a); auto s=replace(a,regex(escapeRegex(**a)),escapeRegexReplace(b)); assert(s==b); }That would be good (although you missed a few :P) Technically any working "escapeRegex" would also function as a valid "escapeRegexReplace", although it might be slightly faster to have a specialised one.
May 29 2013
On Thursday, 30 May 2013 at 06:50:06 UTC, Timothee Cour wrote:ok, here it is: https://github.com/timotheecour/dtools/blob/master/dtools/util/util.d#L78 simplified implementation and added missing escape symbols. Any symbol missing? I was basing myself based on http://dlang.org/phobos/std_regex.html, table entry '\c where c is one of', but that was incomplete. I'm also noting that table entry 'any character except' is also incomplete.According to this: same escape sequences for both (\c -> c in the replacement string).Technically any working "escapeRegex" would also function as a valid"escapeRegexReplace", although it might be slightly faster to have a specialised one. not sure, because they escape differently (\$ vs $$).
May 30 2013
can use the same escape sequences for both (\c -> c in the replacement string). Your suggestion does not work; try for yourself by replacing the $$ by \$ in my code. Is that a bug in std.regex' doc? eg: replace("",regex(``),`\$`); => invalid format string in regex replace However everything works fine with $$, see my code above. On Thu, May 30, 2013 at 1:14 AM, Diggory <diggsey googlemail.com> wrote:On Thursday, 30 May 2013 at 06:50:06 UTC, Timothee Cour wrote:ok, here it is: https://github.com/**timotheecour/dtools/blob/** master/dtools/util/util.d#L78<https://github.com/timotheecour/dtools/blob/master/dtools/util/util.d#L78> simplified implementation and added missing escape symbols. Any symbol missing? I was basing myself based on http://dlang.org/phobos/std_**regex.html<http://dlang.org/phobos/std_regex.html>, table entry '\c where c is one of', but that was incomplete. I'm also noting that table entry 'any character except' is also incomplete. Technically any working "escapeRegex" would also function as a validreplacement string)."escapeRegexReplace", although it might be slightly faster to have a specialised one. not sure, because they escape differently (\$ vs $$).
May 30 2013
30-May-2013 14:24, Timothee Cour пишет:escape sequences for both (\c -> c in the replacement string). Your suggestion does not work; try for yourself by replacing the $$ by \$ in my code. Is that a bug in std.regex' doc? eg: replace("",regex(``),`\$`); => invalid format string in regex replaceIndeed replace format string is a different beast. I can't recall if I stolen the original std.regex or devised this $$ myself. By any rate replace(fmt, `\$`, "$$") would work or the same with replace from std.string. So I feel it's a bit of stretch to include a function for such a narrow case.However everything works fine with $$, see my code above. On Thu, May 30, 2013 at 1:14 AM, Diggory <diggsey googlemail.com <mailto:diggsey googlemail.com>> wrote: On Thursday, 30 May 2013 at 06:50:06 UTC, Timothee Cour wrote: ok, here it is: https://github.com/__timotheecour/dtools/blob/__master/dtools/util/util.d#L78 <https://github.com/timotheecour/dtools/blob/master/dtools/util/util.d#L78> simplified implementation and added missing escape symbols. Any symbol missing? I was basing myself based on http://dlang.org/phobos/std___regex.html <http://dlang.org/phobos/std_regex.html>, table entry '\c where c is one of', but that was incomplete. I'm also noting that table entry 'any character except' is also incomplete. Technically any working "escapeRegex" would also function as a valid "escapeRegexReplace", although it might be slightly faster to have a specialised one. not sure, because they escape differently (\$ vs $$). same escape sequences for both (\c -> c in the replacement string).-- Dmitry Olshansky
May 30 2013
Your suggestion does not work; try for yourself by replacing the $$ by \$ in my code. Is that a bug in std.regex' doc? eg: replace("",regex(``),`\$`); => invalid format string in regex replace However everything works fine with $$, see my code above.Either the doc or the code should probably be changed then so they are consistent.
May 30 2013
30-May-2013 10:49, Timothee Cour пишет:ok, here it is: https://github.com/timotheecour/dtools/blob/master/dtools/util/util.d#L78 simplified implementation and added missing escape symbols. Any symbol missing? I was basing myself based on http://dlang.org/phobos/std_regex.html, table entry '\c where c is one of', but that was incomplete. I'm also noting that table entry 'any character except' is also incomplete.One thing missing that '.' that should become '\.'.> Technically any working "escapeRegex" would also function as a valid "escapeRegexReplace", although it might be slightly faster to have a specialised one. not sure, because they escape differently (\$ vs $$). shall i do a pull request for std.regex?Yes, please. It's was a blind spot for long time. Strictly speaking I think that a generic escaping routine would work: auto escape(S1, S2, C)(S1 src, S2 escapables, C escape='\\') if(isSomeString!S1 && isSomeString!S2 && isSomeChar!C) { .... } Do we have something like this in std.string? Then all we need is a convenience wrapper in std.regex? BTW unescape is as important.On Wed, May 29, 2013 at 8:32 PM, Diggory <diggsey googlemail.com <mailto:diggsey googlemail.com>> wrote: On Wednesday, 29 May 2013 at 23:33:30 UTC, timotheecour wrote: something like this, which we should have in std.regex: string escapeRegex(string a){ import std.string; enum transTable = ['[' : `\[`, '|' : `\|`, '*': `\*`, '+': `\+`, '?': `\?`, '(': `\(`, ')': `\)`]; return translate(a, transTable); } string escapeRegexReplace(string a){ import std.string; // enum transTable = ['$' : `$$`, '\\' : `\\`]; enum transTable = ['$' : `$$`]; return translate(a, transTable); } unittest{ string a=`asdf(def[ghi]+*|)`; assert(match(a,regex(__escapeRegex(a))).hit==a); auto s=replace(a,regex(escapeRegex(__a)),escapeRegexReplace(b)); assert(s==b); } That would be good (although you missed a few :P) Technically any working "escapeRegex" would also function as a valid "escapeRegexReplace", although it might be slightly faster to have a specialised one.-- Dmitry Olshansky
May 30 2013