digitalmars.D.learn - Nested RegEx
- nrgyzer (10/10) Apr 21 2012 Hi guys,
- Dmitry Olshansky (13/23) Apr 21 2012 Simply put pure regex is incapable of arbitrary-nested if-else
- H. S. Teoh (16/30) Apr 21 2012 This is of theoretical interest to me. Very often I find myself wanting
- Dmitry Olshansky (29/55) Apr 21 2012 Recursive descent is not particularly bad unless minimal "grammar
- nrgyzer (6/64) Apr 24 2012 Thanks guys... I just solved the problem by using indexOf() from std.str...
Hi guys, I'm trying to use std.regex to parse a string like the following: string myString = "preOuter {if condition1} content1 {if condition2} content2 {elseif condition3} content3 {else}any other content{/if}{/if} postOuter"; Is there any chance to use std.regex to parse the string above? I currently used the following expression: auto r = regex(`(.*?)\{if:(?P<condition>(.+?))\}(?P<content>(.*))(\{/if\}) (.*)`, "g"); but it doesn't fit my nested if- and else-statements correctly. Thanks in advance for any suggestions!
Apr 21 2012
On 21.04.2012 21:24, nrgyzer wrote:Hi guys, I'm trying to use std.regex to parse a string like the following: string myString = "preOuter {if condition1} content1 {if condition2} content2 {elseif condition3} content3 {else}any other content{/if}{/if} postOuter";Simply put pure regex is incapable of arbitrary-nested if-else statements. In a sense if... /if is a case of balanced parens problem and it's widely know to form non-regular language. The way out is to either preconstruct regex pattern for a given maximum depth or just use handwritten parser (it may use regex for parts of input internally). One day I may add push-pop stack extensions (that allow this kind of thing) into regex but I'd have to think hard to make it efficient. (look at .NET regex they kind of have syntax for this but it's too underpowered for my taste)Is there any chance to use std.regex to parse the string above? I currently used the following expression: auto r = regex(`(.*?)\{if:(?P<condition>(.+?))\}(?P<content>(.*))(\{/if\}) (.*)`, "g"); but it doesn't fit my nested if- and else-statements correctly. Thanks in advance for any suggestions!-- Dmitry Olshansky
Apr 21 2012
On Sat, Apr 21, 2012 at 09:41:18PM +0400, Dmitry Olshansky wrote:On 21.04.2012 21:24, nrgyzer wrote:This is of theoretical interest to me. Very often I find myself wanting a concise way to express patterns with nested matching delimiters, but regexes can't do it. But to jump to a full-blown stack-based language seems like an overkill to me: stack languages can express *much* more than just nested delimiters, most of which is difficult to encapsulate in a nice, concise syntax like regexes. All I want is a simple way to express the kind of simple nesting that matching delimiters give. [...]Hi guys, I'm trying to use std.regex to parse a string like the following: string myString = "preOuter {if condition1} content1 {if condition2} content2 {elseif condition3} content3 {else}any other content{/if}{/if} postOuter";Simply put pure regex is incapable of arbitrary-nested if-else statements. In a sense if... /if is a case of balanced parens problem and it's widely know to form non-regular language.One day I may add push-pop stack extensions (that allow this kind of thing) into regex but I'd have to think hard to make it efficient.[...] Plus, have a nice concise syntax for it (otherwise I might as well just write a recursive descent parser for it in the first place). T -- The day Microsoft makes something that doesn't suck is probably the day they start making vacuum cleaners... -- Slashdotter
Apr 21 2012
On 21.04.2012 22:46, H. S. Teoh wrote:On Sat, Apr 21, 2012 at 09:41:18PM +0400, Dmitry Olshansky wrote:Recursive descent is not particularly bad unless minimal "grammar descent depth" is high. Example: a+b-c uses a quite a lot of recursive calls for grammar with e.g. 10+ operator precedence levels. I'm thinking of merging operator precedence parser with regex might be a happy marriage you know. Back to OP topic something along the lines of this will do it (beware of stack overflow): void parseIf(){ static int ifNest; if(input.startWith("{if")){ ifNest++; scope(exit) ifNest--; enforce(ifNest < 10000, "conservative stack overflow"); parseCond(input[2..$-1]);//regex that does condition enforce(input[$-1] == '}', "close that if"); parseIf();//parse body or another nested if //parseElse();//goes here, as does elif } else parseBody();// the regex you used before }On 21.04.2012 21:24, nrgyzer wrote:This is of theoretical interest to me. Very often I find myself wanting a concise way to express patterns with nested matching delimiters, but regexes can't do it. But to jump to a full-blown stack-based language seems like an overkill to me: stack languages can express *much* more than just nested delimiters, most of which is difficult to encapsulate in a nice, concise syntax like regexes. All I want is a simple way to express the kind of simple nesting that matching delimiters give.Hi guys, I'm trying to use std.regex to parse a string like the following: string myString = "preOuter {if condition1} content1 {if condition2} content2 {elseif condition3} content3 {else}any other content{/if}{/if} postOuter";Simply put pure regex is incapable of arbitrary-nested if-else statements. In a sense if... /if is a case of balanced parens problem and it's widely know to form non-regular language.[...]Concise syntax & lots of power is a priority actually, because I can detect if push-pop stuff is used or not and pick the right engine for the job. So different big-oh complexity is not important. -- Dmitry OlshanskyOne day I may add push-pop stack extensions (that allow this kind of thing) into regex but I'd have to think hard to make it efficient.[...] Plus, have a nice concise syntax for it (otherwise I might as well just write a recursive descent parser for it in the first place).
Apr 21 2012
== Auszug aus Dmitry Olshansky (dmitry.olsh gmail.com)'s ArtikelOn 21.04.2012 22:46, H. S. Teoh wrote:content2On Sat, Apr 21, 2012 at 09:41:18PM +0400, Dmitry Olshansky wrote:On 21.04.2012 21:24, nrgyzer wrote:Hi guys, I'm trying to use std.regex to parse a string like the following: string myString = "preOuter {if condition1} content1 {if condition2}Thanks guys... I just solved the problem by using indexOf() from std.string for parsing my if-statements. So... in principle, I can have unlimited nested if- statements, but I think it's much more difficult (and slower) than using any regex-expression.Recursive descent is not particularly bad unless minimal "grammar descent depth" is high. Example: a+b-c uses a quite a lot of recursive calls for grammar with e.g. 10+ operator precedence levels. I'm thinking of merging operator precedence parser with regex might be a happy marriage you know. Back to OP topic something along the lines of this will do it (beware of stack overflow): void parseIf(){ static int ifNest; if(input.startWith("{if")){ ifNest++; scope(exit) ifNest--; enforce(ifNest < 10000, "conservative stack overflow"); parseCond(input[2..$-1]);//regex that does condition enforce(input[$-1] == '}', "close that if"); parseIf();//parse body or another nested if //parseElse();//goes here, as does elif } else parseBody();// the regex you used before }This is of theoretical interest to me. Very often I find myself wanting a concise way to express patterns with nested matching delimiters, but regexes can't do it. But to jump to a full-blown stack-based language seems like an overkill to me: stack languages can express *much* more than just nested delimiters, most of which is difficult to encapsulate in a nice, concise syntax like regexes. All I want is a simple way to express the kind of simple nesting that matching delimiters give.{elseif condition3} content3 {else}any other content{/if}{/if} postOuter";Simply put pure regex is incapable of arbitrary-nested if-else statements. In a sense if... /if is a case of balanced parens problem and it's widely know to form non-regular language.[...]Concise syntax & lots of power is a priority actually, because I can detect if push-pop stuff is used or not and pick the right engine for the job. So different big-oh complexity is not important.One day I may add push-pop stack extensions (that allow this kind of thing) into regex but I'd have to think hard to make it efficient.[...] Plus, have a nice concise syntax for it (otherwise I might as well just write a recursive descent parser for it in the first place).
Apr 24 2012