digitalmars.D.bugs - regexp (reluctant)
- Fredrik Olsson (34/34) Aug 09 2005 Since no one have replied to my "export to .h" post, and google yield no...
- Chris Sauls (7/12) Aug 09 2005 This is such a little thing, but you probably should use WYSIWYG string ...
- Walter (3/3) Aug 09 2005 Can you flesh it out a bit with some minimal sample text, the result you...
- Fredrik Olsson (37/42) Aug 10 2005 This is as shart as I can get it will the behavior still intact:
- Walter (1/1) Aug 10 2005 Thanks, I can work with that.
Since no one have replied to my "export to .h" post, and google yield no matches, I have begun coding a D exports to C header file tool. And I see it fitting to use D for the task. I would usual use Ruby, but hey, some practice at D and another recource for the comunity ;). I have defined this regexp: const char[] redoccom = "\\s*(\\/\\*[\\*\\!].*?\\*\\/)?\\s*"; Or in more "readable" form: \s*(\/\*[\*\!].*?\*\/)?\s* The documentation for regexp does not mention reluctant quanitfiers, usually (exp)* would mean find the largest match you can of (exp) and (exp)*? would mean the smallest match of (exp). Well this is not the problem, even though I think the documentation should state if *, ? and the {} quantifiers are greedy or reluctant. For those who read regular expressions this is a simplistic match for an optional documentation comment in code on one of theese two forms (With total ignorance of content as long as it is no nested comments): /** Foo */ or /*! Bar */ With a capture group for the actual comment. Useful for example as: new TegExp(redocom ~ "export", "m"); to find exported members in a file allong with relevant documentation. Any how. With or without the reluctant quantifier D does not give me the result I expect. I use SubEthaEdit with default regexp syntax (Ruby) to verify the matches and correct capture groups (Only problem is that with greedy quantifiers it matches from the start of the very first docdomment to the end of the very last comment, something reluctant quantifiers is required to compensate for). regard Fredrik Olsson
Aug 09 2005
Fredrik Olsson wrote:I have defined this regexp: const char[] redoccom = "\\s*(\\/\\*[\\*\\!].*?\\*\\/)?\\s*"; Or in more "readable" form: \s*(\/\*[\*\!].*?\*\/)?\s*This is such a little thing, but you probably should use WYSIWYG string literals for regexp's, to help with readability. (I know I do.) Or: -- Chris Sauls
Aug 09 2005
Can you flesh it out a bit with some minimal sample text, the result you get with regexp, and what the correct result should be? Also, can you try and simpify the regular expression as much as possible?
Aug 09 2005
Walter wrote:Can you flesh it out a bit with some minimal sample text, the result you get with regexp, and what the correct result should be? Also, can you try and simpify the regular expression as much as possible?This is as shart as I can get it will the behavior still intact: /* BEGIN: regexp.d */ import std.regexp; import std.stdio; int main(char[][] args) { RegExp re = new RegExp(r"\s*(\*.*?\*)?\s*", null); char[][] ms = re.match("*\n foo\n * bar"); foreach(char[] m; ms) { writefln("'" ~ m ~ "'"); } return 0; } /* END: regexp.d */ And an actual compile/run session: peylow imanicken:~$ gdc regexp.d -o regexp; ./regexp '' '' peylow imanicken:~$ Excpected compile run: peylow imanicken:~$ gdc regexp.d -o regexp; ./regexp '* foo * ' '* foo *' peylow imanicken:~$ If I remove the newlines in the string and search in "* foo * bar" then I correctly get: peylow imanicken:~$ gdc regexp.d -o regexp; ./regexp '* foo * ' '* foo *' peylow imanicken:~$ So it seams that "." dos not match any character, as it misses newline. Regards Fredrik Olsson
Aug 10 2005