digitalmars.D.learn - Restrictions in std.regexp?
- Olaf Pohlmann (10/10) May 02 2006 Hi,
- Lionello Lunesu (4/13) May 02 2006 Use "AB(CD)EF" and re.match(1) ??
- Derek Parnell (24/33) May 02 2006 I can't tell what it is you are trying to do but it seems that the RE
- Olaf Pohlmann (20/22) May 02 2006 No. I'm looking for a string that is preceeded and followed by well
- Olaf Pohlmann (13/14) May 02 2006 Oops, this is actually very close to the solution, just drop both '?'.
Hi, the documentation of std.regexp is somewhat sparse, so I tried to find out a few things on my own. There seems to be no way to do lookaheads and lookbehinds. This: RegExp re = search("ABCDEF", "(?<=AB)CD(?=EF)"); should find "CD" as a match, but it yields a runtime error: Error: *+? not allowed in atom Is there any other way to get this working or am I just out of luck with the current implementation? op
May 02 2006
Olaf Pohlmann wrote:Hi, the documentation of std.regexp is somewhat sparse, so I tried to find out a few things on my own. There seems to be no way to do lookaheads and lookbehinds. This: RegExp re = search("ABCDEF", "(?<=AB)CD(?=EF)"); should find "CD" as a match, but it yields a runtime error:Use "AB(CD)EF" and re.match(1) ?? I'm very inexperienced with regexp, mind you :S L.
May 02 2006
On Tue, 02 May 2006 23:39:13 +1000, Olaf Pohlmann <op nospam.org> wrote:Hi, the documentation of std.regexp is somewhat sparse, so I tried to find out a few things on my own. There seems to be no way to do lookaheads and lookbehinds. This: RegExp re = search("ABCDEF", "(?<=AB)CD(?=EF)"); should find "CD" as a match, but it yields a runtime error: Error: *+? not allowed in atom Is there any other way to get this working or am I just out of luck with the current implementation?I can't tell what it is you are trying to do but it seems that the RE syntax you are expecting is not what has been implemented. See http:http://www.digitalmars.com/ctg/regular.html for details. Are you looking for an optional "AB" followed by "CD" followed by an optional "EF" ? If so try RegExp re = search("ABCDEF", "(AB)?(CD)(EF)?"); Here is a sample program ... import std.stdio; import std.regexp; void main() { RegExp re = search("AXCDEFGHI", "(AB)?(CD)(EF)?"); writefln("PRE: %s", re.pre()); writefln("MATCH: %s", re.match(0)); writefln("SUB1: %s", re.match(1)); writefln("SUB2: %s", re.match(2)); // this should be 'CD' writefln("SUB3: %s", re.match(3)); writefln("POST: %s", re.post()); } -- Derek Parnell Melbourne, Australia
May 02 2006
Derek Parnell wrote:Are you looking for an optional "AB" followed by "CD" followed by an optional "EF" ?No. I'm looking for a string that is preceeded and followed by well defined other strings. The match should *not* return the whole sequence but only what is in the middle. It's actually about parsing some kind of text markup. If it was html like "<body><h1>Welcome</h1></body>" it should allow me to retrieve only the "Welcome". If you just use some grouping the match will be the whole <h1> element, so you have to extract the content in a 2nd step. The regexp with lookahead and lookbehind works fine in Python: import re html = "<body>\n<h1>Welcome</h1>\n</body>" match = re.search("(?<=\<h1\>).*?(?=\</h1\>)", html) html[m.start():m.end()] This prints 'Welcome'. The regexp is a bit hard to read, so see http://docs.python.org/lib/re-syntax.html for a description. Now, I can retrieve the whole h1 element with the D version of regexps and then do another scan for the content but it would be nice to get it in one step, like in the Python version. op
May 02 2006
Derek Parnell wrote:RegExp re = search("ABCDEF", "(AB)?(CD)(EF)?");Oops, this is actually very close to the solution, just drop both '?'. It's even more readable than what I tried before: import std.stdio; import std.regexp; void main() { char[] html = "<body>\n<h1>Welcome</h1>\n</body>"; RegExp re = search(html, r"(\<h1\>)(.*?)(\</h1\>)"); if (re !is null) writefln("%s", re.match(2)); } op
May 02 2006