www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Nested RegEx

reply nrgyzer <nrgyzer gmail.com> writes:
Hi guys,

I'm trying to use std.regex to parse a string like the following:

string myString = "preOuter {if condition1} content1 {if condition2} content2
{elseif condition3} content3 {else}any other content{/if}{/if} postOuter";

Is there any chance to use std.regex to parse the string above? I currently
used the following expression:

auto r = regex(`(.*?)\{if:(?P<condition>(.+?))\}(?P<content>(.*))(\{/if\})
(.*)`, "g");

but it doesn't fit my nested if- and else-statements correctly.

Thanks in advance for any suggestions!
Apr 21 2012
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 21.04.2012 21:24, nrgyzer wrote:
 Hi guys,

 I'm trying to use std.regex to parse a string like the following:

 string myString = "preOuter {if condition1} content1 {if condition2} content2
 {elseif condition3} content3 {else}any other content{/if}{/if} postOuter";
Simply put pure regex is incapable of arbitrary-nested if-else statements. In a sense if... /if is a case of balanced parens problem and it's widely know to form non-regular language. The way out is to either preconstruct regex pattern for a given maximum depth or just use handwritten parser (it may use regex for parts of input internally). One day I may add push-pop stack extensions (that allow this kind of thing) into regex but I'd have to think hard to make it efficient. (look at .NET regex they kind of have syntax for this but it's too underpowered for my taste)
 Is there any chance to use std.regex to parse the string above? I currently
 used the following expression:

 auto r = regex(`(.*?)\{if:(?P<condition>(.+?))\}(?P<content>(.*))(\{/if\})
 (.*)`, "g");

 but it doesn't fit my nested if- and else-statements correctly.

 Thanks in advance for any suggestions!
-- Dmitry Olshansky
Apr 21 2012
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sat, Apr 21, 2012 at 09:41:18PM +0400, Dmitry Olshansky wrote:
 On 21.04.2012 21:24, nrgyzer wrote:
Hi guys,

I'm trying to use std.regex to parse a string like the following:

string myString = "preOuter {if condition1} content1 {if condition2} content2
{elseif condition3} content3 {else}any other content{/if}{/if} postOuter";
Simply put pure regex is incapable of arbitrary-nested if-else statements. In a sense if... /if is a case of balanced parens problem and it's widely know to form non-regular language.
This is of theoretical interest to me. Very often I find myself wanting a concise way to express patterns with nested matching delimiters, but regexes can't do it. But to jump to a full-blown stack-based language seems like an overkill to me: stack languages can express *much* more than just nested delimiters, most of which is difficult to encapsulate in a nice, concise syntax like regexes. All I want is a simple way to express the kind of simple nesting that matching delimiters give. [...]
 One day I may add push-pop stack extensions (that allow this kind of
 thing) into regex but I'd have to think hard to make it efficient.
[...] Plus, have a nice concise syntax for it (otherwise I might as well just write a recursive descent parser for it in the first place). T -- The day Microsoft makes something that doesn't suck is probably the day they start making vacuum cleaners... -- Slashdotter
Apr 21 2012
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 21.04.2012 22:46, H. S. Teoh wrote:
 On Sat, Apr 21, 2012 at 09:41:18PM +0400, Dmitry Olshansky wrote:
 On 21.04.2012 21:24, nrgyzer wrote:
 Hi guys,

 I'm trying to use std.regex to parse a string like the following:

 string myString = "preOuter {if condition1} content1 {if condition2} content2
 {elseif condition3} content3 {else}any other content{/if}{/if} postOuter";
Simply put pure regex is incapable of arbitrary-nested if-else statements. In a sense if... /if is a case of balanced parens problem and it's widely know to form non-regular language.
This is of theoretical interest to me. Very often I find myself wanting a concise way to express patterns with nested matching delimiters, but regexes can't do it. But to jump to a full-blown stack-based language seems like an overkill to me: stack languages can express *much* more than just nested delimiters, most of which is difficult to encapsulate in a nice, concise syntax like regexes. All I want is a simple way to express the kind of simple nesting that matching delimiters give.
Recursive descent is not particularly bad unless minimal "grammar descent depth" is high. Example: a+b-c uses a quite a lot of recursive calls for grammar with e.g. 10+ operator precedence levels. I'm thinking of merging operator precedence parser with regex might be a happy marriage you know. Back to OP topic something along the lines of this will do it (beware of stack overflow): void parseIf(){ static int ifNest; if(input.startWith("{if")){ ifNest++; scope(exit) ifNest--; enforce(ifNest < 10000, "conservative stack overflow"); parseCond(input[2..$-1]);//regex that does condition enforce(input[$-1] == '}', "close that if"); parseIf();//parse body or another nested if //parseElse();//goes here, as does elif } else parseBody();// the regex you used before }
 [...]
 One day I may add push-pop stack extensions (that allow this kind of
 thing) into regex but I'd have to think hard to make it efficient.
[...] Plus, have a nice concise syntax for it (otherwise I might as well just write a recursive descent parser for it in the first place).
Concise syntax & lots of power is a priority actually, because I can detect if push-pop stuff is used or not and pick the right engine for the job. So different big-oh complexity is not important. -- Dmitry Olshansky
Apr 21 2012
parent nrgyzer <nrgyzer gmail.com> writes:
== Auszug aus Dmitry Olshansky (dmitry.olsh gmail.com)'s Artikel
 On 21.04.2012 22:46, H. S. Teoh wrote:
 On Sat, Apr 21, 2012 at 09:41:18PM +0400, Dmitry Olshansky wrote:
 On 21.04.2012 21:24, nrgyzer wrote:
 Hi guys,

 I'm trying to use std.regex to parse a string like the following:

 string myString = "preOuter {if condition1} content1 {if condition2}
content2
 {elseif condition3} content3 {else}any other content{/if}{/if} postOuter";
Simply put pure regex is incapable of arbitrary-nested if-else statements. In a sense if... /if is a case of balanced parens problem and it's widely know to form non-regular language.
This is of theoretical interest to me. Very often I find myself wanting a concise way to express patterns with nested matching delimiters, but regexes can't do it. But to jump to a full-blown stack-based language seems like an overkill to me: stack languages can express *much* more than just nested delimiters, most of which is difficult to encapsulate in a nice, concise syntax like regexes. All I want is a simple way to express the kind of simple nesting that matching delimiters give.
Recursive descent is not particularly bad unless minimal "grammar descent depth" is high. Example: a+b-c uses a quite a lot of recursive calls for grammar with e.g. 10+ operator precedence levels. I'm thinking of merging operator precedence parser with regex might be a happy marriage you know. Back to OP topic something along the lines of this will do it (beware of stack overflow): void parseIf(){ static int ifNest; if(input.startWith("{if")){ ifNest++; scope(exit) ifNest--; enforce(ifNest < 10000, "conservative stack overflow"); parseCond(input[2..$-1]);//regex that does condition enforce(input[$-1] == '}', "close that if"); parseIf();//parse body or another nested if //parseElse();//goes here, as does elif } else parseBody();// the regex you used before }
 [...]
 One day I may add push-pop stack extensions (that allow this kind of
 thing) into regex but I'd have to think hard to make it efficient.
[...] Plus, have a nice concise syntax for it (otherwise I might as well just write a recursive descent parser for it in the first place).
Concise syntax & lots of power is a priority actually, because I can detect if push-pop stuff is used or not and pick the right engine for the job. So different big-oh complexity is not important.
Thanks guys... I just solved the problem by using indexOf() from std.string for parsing my if-statements. So... in principle, I can have unlimited nested if- statements, but I think it's much more difficult (and slower) than using any regex-expression.
Apr 24 2012