www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Regex-Fu

reply "Chris" <wendlec tcd.ie> writes:
I'm a bit at a loss here. I cannot get the longest possible 
match. I tried several versions with eager operators and stuff, 
but D's regex engine(s) always seem to return the shortest match. 
Is there something embarrassingly simple I'm missing?

void main()
{
   import std.regex : regex, matchFirst;
   import std.stdio : writeln;

   auto word = "blablahula";
   auto m = matchFirst(word, regex("^([a-z]+)(hula|ula)$"));
   writeln(m);  // prints ["blablahula", "blablah", "ula"]
}

I want it to return "hula" not "ula".
May 25 2015
next sibling parent "Namespace" <rswhite4 gmail.com> writes:
On Monday, 25 May 2015 at 11:11:50 UTC, Chris wrote:
 I'm a bit at a loss here. I cannot get the longest possible 
 match. I tried several versions with eager operators and stuff, 
 but D's regex engine(s) always seem to return the shortest 
 match. Is there something embarrassingly simple I'm missing?

 void main()
 {
   import std.regex : regex, matchFirst;
   import std.stdio : writeln;

   auto word = "blablahula";
   auto m = matchFirst(word, regex("^([a-z]+)(hula|ula)$"));
   writeln(m);  // prints ["blablahula", "blablah", "ula"]
 }

 I want it to return "hula" not "ula".
Make the + operator less greedy: matchFirst(word, regex("^([a-z]+?)(hula|ula)$"));
May 25 2015
prev sibling parent reply "novice2" <sorryno em.ail> writes:
 I cannot get the longest possible
it match longest for first group ([a-z]+) try ^([a-z]+?)(hula|ula)$
May 25 2015
parent "Chris" <wendlec tcd.ie> writes:
On Monday, 25 May 2015 at 11:20:46 UTC, novice2 wrote:
 I cannot get the longest possible
it match longest for first group ([a-z]+) try ^([a-z]+?)(hula|ula)$
Namespace, novice2: Ah, I see. The problem was with the first group that was too greedy, not with the second. I was focusing on the latter. Thanks, this works now!
May 25 2015