digitalmars.D.learn - Issues with std.regex
- MrAppleseed (46/46) Feb 16 2013 Hey all,
- FG (2/13) Feb 16 2013 Perhaps try this: "[ 0-9a-zA-Z.*=+-;()\"\'\\[\\]<>,{}^#/\\]"
- MrAppleseed (10/31) Feb 16 2013 Hey,
- jerro (5/7) Feb 16 2013 The problem here is that you have \ right before the ] at the end
- MrAppleseed (10/18) Feb 20 2013 Sorry for the delay in response,
- MrAppleseed (9/9) Feb 20 2013 Hello to everyone, and thank you for your help!
- FG (5/11) Feb 16 2013 Ah, right. Sorry for that. You'd need as much as 4 backslashes there. :)
- Dmitry Olshansky (12/23) Feb 17 2013 Like others noted the problem is 2-fold:
- H. S. Teoh (10/34) Feb 16 2013 The problem is that you're using D's double-quoted string literal, which
- Namespace (1/1) Feb 16 2013 As long as there is \" I get the same error.
- MrAppleseed (16/64) Feb 16 2013 Thanks for the quick reply!
- jerro (3/8) Feb 16 2013 You need to put \ in front of [ or ] if you want to match those
Hey all, I'm currently trying to port my small toy language I invented awhile back in Java to D. However, a main part of my lexical analyzer was regular expression matching, which I've been having issues with in D. The regex expression in question is as follows: This works well enough in Java to produce a series of tokens that I could then pass to my parser. But when I tried to port this into D, I almost always get an error when using brackets, braces, or parenthesis. I've tried several different combinations, have looked through the std.regex library reference, have Googled this issue, have tested my regular expression in several online-regex testers (primarily http://regexpal.com/, and http://regexhelper.com/), and have even looked it up in the book, "The D Programming Language" (good book, by the way), yet I still can't get it working right. Here's the code I've been using: ... auto tempCont = cast(char[])read(location, fileSize); string contents = cast(string)tempCont; auto m = match(contents, reg); auto token = m.captures ... When I try to run the code above, I get: parser.d(64): Error: undefined escape sequence \[ parser.d(64): Error: undefined escape sequence \] When I remove the escaped characters (turning my regex into compiling or linking. However, on first run, I get the following error (I cut the error short, full error is pasted http://pastebin.com/vjMhkx4N): std.regex.RegexException /usr/include/dmd/phobos/std/regex.d(1942): wrong CodepointSet Pattern with error: `[ 0-9a-zA-Z.*=+-;()"'[]` <--HERE-- I'm very confused on what to do, and much of the information in the library reference seems to contradict what I'm doing. Any help would greatly appreciated! Thanks! ~Mr. Appleseed Additional information: OS/Compiler information: Ubuntu 12.10 x64 DMD64 D Compiler v2.061 Compiled with: dmd main.d parser.d
Feb 16 2013
On 2013-02-16 21:22, MrAppleseed wrote:When I try to run the code above, I get: parser.d(64): Error: undefined escape sequence \[ parser.d(64): Error: undefined escape sequence \] When I remove the escaped characters (turning my regex into However, on first run, I get the following error (I cut the error short, full error is pasted http://pastebin.com/vjMhkx4N): std.regex.RegexException /usr/include/dmd/phobos/std/regex.d(1942): wrong CodepointSet
Feb 16 2013
On Saturday, 16 February 2013 at 20:33:15 UTC, FG wrote:On 2013-02-16 21:22, MrAppleseed wrote:Hey, Thanks for the reply! You guys are quite the friendly people. :) I made the changes you suggested above, and although it compiled fine, on the first run I got a similar error: std.regex.RegexException /usr/include/dmd/phobos/std/regex.d(1942): unexpected end of CodepointSet <--HERE-- `` (Full error is here: http://pastebin.com/rTmHuVjG)When I try to run the code above, I get: parser.d(64): Error: undefined escape sequence \[ parser.d(64): Error: undefined escape sequence \] When I remove the escaped characters (turning my regex into compiling or linking. However, on first run, I get the following error (I cut the error short, full error is pasted http://pastebin.com/vjMhkx4N): std.regex.RegexException /usr/include/dmd/phobos/std/regex.d(1942): wrong CodepointSet Pattern with error: `[ 0-9a-zA-Z.*=+-;()"'[]` <--HERE--
Feb 16 2013
<--HERE-- ``The problem here is that you have \ right before the ] at the end of the string. Because it is preceeded by \, ] is interpretted as a character you are matching on, not as a closing bracket for the initial [. If you want to match \ you need this:
Feb 16 2013
On Saturday, 16 February 2013 at 21:58:23 UTC, jerro wrote:Sorry for the delay in response, As you can read in the original post, I have tried the suggestions in both of your comments (couldn't figure out how to reply to both, unfortunately). Both of which caused errors. The code you suggested is one of the first I tried,( ), yet I still got that error. I believe that the regex engine changed the "\\" into a single backslash "\" which is displayed in the error you quoted.<--HERE-- ``The problem here is that you have \ right before the ] at the end of the string. Because it is preceeded by \, ] is interpretted as a character you are matching on, not as a closing bracket for the initial [. If you want to match \ you need this:
Feb 20 2013
Hello to everyone, and thank you for your help! Sorry for the delay in response, as I was busy with family matters. However, upon returning today, and with everyone's help, I have successfully gotten it to work. The code below worked out swimmingly: auto m = match(contents, reg); auto token = m.captures; Once again, thank you all for your help! :)
Feb 20 2013
On 2013-02-16 22:36, MrAppleseed wrote:Ah, right. Sorry for that. You'd need as much as 4 backslashes there. :) Ain't pretty so it's better to go with raw strings, but apparently there are some problems with them right now, looking at the other posts here, right?I made the changes you suggested above, and although it compiled fine, on the first run I got a similar error: std.regex.RegexException /usr/include/dmd/phobos/std/regex.d(1942): unexpected end of CodepointSet
Feb 16 2013
17-Feb-2013 01:36, MrAppleseed пишет:On Saturday, 16 February 2013 at 20:33:15 UTC, FG wrote:Like others noted the problem is 2-fold: - escaping special characters (both as D string literla and regex escaping itself). So just use `` or r"" or some form of WYSIWYG string *AND* do escaping for things that are part of regex syntax. - [] are used for nesting, as std.regex supports set-wise operations inside of [...] character class e.g. [[A-Z]&&[A-D]] means intersection and would yield a set of [A-D]. It gets more useful with Unicode character sets. -- Dmitry OlshanskyOn 2013-02-16 21:22, MrAppleseed wrote:When I try to run the code above, I get: parser.d(64): Error: undefined escape sequence \[ parser.d(64): Error: undefined escape sequence \] When I remove the escaped characters (turning my regex into linking.
Feb 17 2013
On Sat, Feb 16, 2013 at 09:22:07PM +0100, MrAppleseed wrote:Hey all, I'm currently trying to port my small toy language I invented awhile back in Java to D. However, a main part of my lexical analyzer was regular expression matching, which I've been having issues with in D. The regex expression in question is as follows: This works well enough in Java to produce a series of tokens that I could then pass to my parser. But when I tried to port this into D, I almost always get an error when using brackets, braces, or parenthesis. I've tried several different combinations, have looked through the std.regex library reference, have Googled this issue, have tested my regular expression in several online-regex testers (primarily http://regexpal.com/, and http://regexhelper.com/), and have even looked it up in the book, "The D Programming Language" (good book, by the way), yet I still can't get it working right. Here's the code I've been using: ... auto tempCont = cast(char[])read(location, fileSize); string contents = cast(string)tempCont;The problem is that you're using D's double-quoted string literal, which adds another level of interpretation to the \'s. What you should do is to use the backtick string literal, which does *not* interpret backslashes: If you have trouble typing `, you can also use r"...", which means the same thing. Hope this helps. --T
Feb 16 2013
As long as there is \" I get the same error.
Feb 16 2013
On Saturday, 16 February 2013 at 20:35:48 UTC, H. S. Teoh wrote:On Sat, Feb 16, 2013 at 09:22:07PM +0100, MrAppleseed wrote:Thanks for the quick reply! I replaced the double-quotes with backticks, compiled it with no problems, but on the first run I got a similar error: std.regex.RegexException /usr/include/dmd/phobos/std/regex.d(1942): invalid escape sequence Pattern with error: `[ 0-9a-zA-Z.*=+-;()\"` <--HERE-- After removing the invalid escape sequence, I compiled it, once again with no problems, and attempted to run it, but I got the same error as before: std.regex.RegexException /usr/include/dmd/phobos/std/regex.d(1942): wrong CodepointSet Pattern with error: `[ 0-9a-zA-Z.*=+-;()"'[]` <--HERE-- (Entire error here: http://pastebin.com/Su9XzbXW)Hey all, I'm currently trying to port my small toy language I invented awhile back in Java to D. However, a main part of my lexical analyzer was regular expression matching, which I've been having issues with in D. The regex expression in question is as follows: This works well enough in Java to produce a series of tokens that I could then pass to my parser. But when I tried to port this into D, I almost always get an error when using brackets, braces, or parenthesis. I've tried several different combinations, have looked through the std.regex library reference, have Googled this issue, have tested my regular expression in several online-regex testers (primarily http://regexpal.com/, and http://regexhelper.com/), and have even looked it up in the book, "The D Programming Language" (good book, by the way), yet I still can't get it working right. Here's the code I've been using: ... auto tempCont = cast(char[])read(location, fileSize); string contents = cast(string)tempCont;The problem is that you're using D's double-quoted string literal, which adds another level of interpretation to the \'s. What you should do is to use the backtick string literal, which does *not* interpret backslashes: If you have trouble typing `, you can also use r"...", which means the same thing. Hope this helps. --T
Feb 16 2013
std.regex.RegexException /usr/include/dmd/phobos/std/regex.d(1942): wrong CodepointSet Pattern with error: `[ 0-9a-zA-Z.*=+-;()"'[]` <--HERE-- (Entire error here: http://pastebin.com/Su9XzbXW)You need to put \ in front of [ or ] if you want to match those two characters. The relevant part of std.regex documentation: \c where c is one of [|*+?() Matches the character c itself.
Feb 16 2013