digitalmars.D.learn - std.regex with multiple matches

David Gileadi (17/17) Apr 21 2011 I was using std.regex yesterday, matching a regular expression against a...

Dmitry Olshansky (8/26) Apr 21 2011 I might be wrong but I think that you are looking for std.regex.splitter...

David Gileadi (3/27) Apr 21 2011 I considered that but I also need the content of the matches--the

Kai Meyer (64/81) Apr 21 2011 There's two ways I can think of off the top of my head.

David Gileadi (4/37) Apr 21 2011 (snip an excellent explanation)

David Gileadi <gileadis NSPMgmail.com> writes:

I was using std.regex yesterday, matching a regular expression against a 
string with the "g" flag to find multiple matches.  As the example from 
the docs shows (BTW I think the example may be wrong; I think it needs 
the "g" flag added to the regex call), you can do a foreach loop on the 
matches like:

foreach(m; match("abcabcabab", regex("ab")))
{
     writefln("%s[%s]%s", m.pre, m.hit, m.post);
}

Each match "m" is a RegexMatch, which includes .pre, .hit, and .post 
properties to return ranges of everything before, inside, and after the 
match.

However what I really wanted was a way to get the range between matches, 
i.e. since I had multiple matches I wanted something like m.upToNextMatch.

Since I'm not very familiar with ranges, am I missing some obvious way 
of doing this with the existing .pre, .hit and .post properties?

-Dave

Apr 21 2011

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On 21.04.2011 21:43, David Gileadi wrote:
 I was using std.regex yesterday, matching a regular expression against 
 a string with the "g" flag to find multiple matches.  As the example 
 from the docs shows (BTW I think the example may be wrong; I think it 
 needs the "g" flag added to the regex call), you can do a foreach loop 
 on the matches like:

 foreach(m; match("abcabcabab", regex("ab")))
 {
     writefln("%s[%s]%s", m.pre, m.hit, m.post);
 }

 Each match "m" is a RegexMatch, which includes .pre, .hit, and .post 
 properties to return ranges of everything before, inside, and after 
 the match.

 However what I really wanted was a way to get the range between 
 matches, i.e. since I had multiple matches I wanted something like 
 m.upToNextMatch.

I might be wrong but I think that you are looking for std.regex.splitter:

auto  s1 =", abc, de,  fg, hi,";
assert(equal(splitter(s1, regex(", *")),
     ["","abc","de","fg","hi",""][]))

Simply put it gets you range of slices of input separated by regex matches.
 Since I'm not very familiar with ranges, am I missing some obvious way 
 of doing this with the existing .pre, .hit and .post properties?

 -Dave


-- 
Dmitry Olshansky

Apr 21 2011

David Gileadi <gileadis NSPMgmail.com> writes:

On 4/21/11 11:36 AM, Dmitry Olshansky wrote:
 On 21.04.2011 21:43, David Gileadi wrote:
 I was using std.regex yesterday, matching a regular expression against
 a string with the "g" flag to find multiple matches. As the example
 from the docs shows (BTW I think the example may be wrong; I think it
 needs the "g" flag added to the regex call), you can do a foreach loop
 on the matches like:

 foreach(m; match("abcabcabab", regex("ab")))
 {
 writefln("%s[%s]%s", m.pre, m.hit, m.post);
 }

 Each match "m" is a RegexMatch, which includes .pre, .hit, and .post
 properties to return ranges of everything before, inside, and after
 the match.

 However what I really wanted was a way to get the range between
 matches, i.e. since I had multiple matches I wanted something like
 m.upToNextMatch.

 I might be wrong but I think that you are looking for std.regex.splitter:

 auto s1 =", abc, de, fg, hi,";
 assert(equal(splitter(s1, regex(", *")),
 ["","abc","de","fg","hi",""][]))

 Simply put it gets you range of slices of input separated by regex matches.

I considered that but I also need the content of the matches--the 
captures, etc.

Apr 21 2011

Kai Meyer <kai unixlords.com> writes:

On 04/21/2011 11:43 AM, David Gileadi wrote:
 I was using std.regex yesterday, matching a regular expression against a
 string with the "g" flag to find multiple matches. As the example from
 the docs shows (BTW I think the example may be wrong; I think it needs
 the "g" flag added to the regex call), you can do a foreach loop on the
 matches like:

 foreach(m; match("abcabcabab", regex("ab")))
 {
 writefln("%s[%s]%s", m.pre, m.hit, m.post);
 }

 Each match "m" is a RegexMatch, which includes .pre, .hit, and .post
 properties to return ranges of everything before, inside, and after the
 match.

 However what I really wanted was a way to get the range between matches,
 i.e. since I had multiple matches I wanted something like m.upToNextMatch.

 Since I'm not very familiar with ranges, am I missing some obvious way
 of doing this with the existing .pre, .hit and .post properties?

 -Dave

There's two ways I can think of off the top of my head.

I don't think D supports "look ahead", but if it did you could match 
something, then capture the portion afterwards (in m.captures[1]) that 
matches everything up until the look ahead (which is what you matched in 
the first place).

Otherwise, you could manually capture the ranges like this (captures the 
first word character after each word boundry, then prints the remaining 
portion of the word until the next word boundary followed by a word 
character):

import std.stdio;
import std.regex;

void main()
{
     size_t last_pos;
     size_t last_size;
     string abc = "the quick brown fox jumped over the lazy dog";
     foreach(m; match(abc, regex(r"\b\w")))
     {
         writefln("between: '%s'", abc[last_pos + last_size..m.pre.length]);
         writefln("%s[%s]%s", m.pre, m.hit, m.post);
         last_size = m.hit.length;
         last_pos = m.pre.length;
     }
     writefln("between: '%s'", abc[last_pos + last_size..$]);
}
// Prints:
// between: ''
// [t]he quick brown fox jumped over the lazy dog
// between: 'he '
// the [q]uick brown fox jumped over the lazy dog
// between: 'uick '
// the quick [b]rown fox jumped over the lazy dog
// between: 'rown '
// the quick brown [f]ox jumped over the lazy dog
// between: 'ox '
// the quick brown fox [j]umped over the lazy dog
// between: 'umped '
// the quick brown fox jumped [o]ver the lazy dog
// between: 'ver '
// the quick brown fox jumped over [t]he lazy dog
// between: 'he '
// the quick brown fox jumped over the [l]azy dog
// between: 'azy '
// the quick brown fox jumped over the lazy [d]og
// between: 'og'

If you replace '\b\w' with '\s' it should help illuminate the way it works:

between: 'the'
the[ ]quick brown fox jumped over the lazy dog
between: 'quick'
the quick[ ]brown fox jumped over the lazy dog
between: 'brown'
the quick brown[ ]fox jumped over the lazy dog
between: 'fox'
the quick brown fox[ ]jumped over the lazy dog
between: 'jumped'
the quick brown fox jumped[ ]over the lazy dog
between: 'over'
the quick brown fox jumped over[ ]the lazy dog
between: 'the'
the quick brown fox jumped over the[ ]lazy dog
between: 'lazy'
the quick brown fox jumped over the lazy[ ]dog
between: 'dog'

Apr 21 2011

David Gileadi <gileadis NSPMgmail.com> writes:

On 4/21/11 1:29 PM, Kai Meyer wrote:
 On 04/21/2011 11:43 AM, David Gileadi wrote:
 I was using std.regex yesterday, matching a regular expression against a
 string with the "g" flag to find multiple matches. As the example from
 the docs shows (BTW I think the example may be wrong; I think it needs
 the "g" flag added to the regex call), you can do a foreach loop on the
 matches like:

 foreach(m; match("abcabcabab", regex("ab")))
 {
 writefln("%s[%s]%s", m.pre, m.hit, m.post);
 }

 Each match "m" is a RegexMatch, which includes .pre, .hit, and .post
 properties to return ranges of everything before, inside, and after the
 match.

 However what I really wanted was a way to get the range between matches,
 i.e. since I had multiple matches I wanted something like
 m.upToNextMatch.

 Since I'm not very familiar with ranges, am I missing some obvious way
 of doing this with the existing .pre, .hit and .post properties?

 -Dave

 There's two ways I can think of off the top of my head.

 I don't think D supports "look ahead", but if it did you could match
 something, then capture the portion afterwards (in m.captures[1]) that
 matches everything up until the look ahead (which is what you matched in
 the first place).

 Otherwise, you could manually capture the ranges like this (captures the
 first word character after each word boundry, then prints the remaining
 portion of the word until the next word boundary followed by a word
 character):

(snip an excellent explanation)

Ahh yes, that's a good way of doing it--track the lengths and slice the 
original array to get the "betweens".  Thanks for the insight!

Apr 21 2011

D Programming

C/C++ Programming

Other

digitalmars.D.learn - std.regex with multiple matches