www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Capture offset of matches in std.regex.matchAll?

reply "JD" <jd jd.com> writes:
I'm using a compile time regex to find some tags in an input
string. Is it possible to capture the offset of the matches in
some way? Otherwise I have to "calculate" the offsets myself by
iterating over the results of matchAll.

Thanks,
Jeroen

---

Example code:

import std.stdio;
import std.regex;

void main()
{
	auto input = "<html><body><p>{{ message }}</p></body></html>";
	


	auto matches = matchAll(input, ctr);

           /*
	auto offset = 0;
	foreach(match;matches)
	{
		writeln(offset, ":", match);
		++offset;
	}
          */
}
Jul 07 2014
next sibling parent "JR" <zorael gmail.com> writes:
On Monday, 7 July 2014 at 21:32:30 UTC, JD wrote:
 I'm using a compile time regex to find some tags in an input
 string. Is it possible to capture the offset of the matches in
 some way? Otherwise I have to "calculate" the offsets myself by
 iterating over the results of matchAll.

 Thanks,
 Jeroen
I believe what matchAll returns evaluates its .front lazily, so aye; you need to pop it until you get to the match you want. :< (assuming I understand the question correctly) You can however index the *captured fields* in a specific match. I couldn't wrap my head around your example pattern but see http://dpaste.dzfl.pl/f693db93c3a4 for a dumbed-down version. You can't slice match, nor can you have foreach provide an index variable. This may be to have foreach include named fields? Not sure.
Jul 08 2014
prev sibling next sibling parent reply Justin Whear <justin economicmodeling.com> writes:
On Mon, 07 Jul 2014 21:32:29 +0000, JD wrote:

 I'm using a compile time regex to find some tags in an input string. Is
 it possible to capture the offset of the matches in some way? Otherwise
 I have to "calculate" the offsets myself by iterating over the results
 of matchAll.
 
 Thanks,
 Jeroen
 
 ---
 
 Example code:
 
 import std.stdio;
 import std.regex;
 
 void main()
 {
 	auto input = "<html><body><p>{{ message }}</p></body></html>";
 	

 
 	auto matches = matchAll(input, ctr);
 
            /*
 	auto offset = 0;
 	foreach(match;matches)
 	{
 		writeln(offset, ":", match);
 		++offset;
 	}
           */
 }
What do you mean by offset? If you simply mean the index of the match, as your example seems to indicate, you can zip the matches with iota or sequence!"n". If you want the offset in the string where each match begins I think you're out of luck. I needed something similar a while ago and the best I could find was using the cumulative length of the pre property.
Jul 08 2014
parent "JD" <jd jd.com> writes:
On Tuesday, 8 July 2014 at 15:58:47 UTC, Justin Whear wrote:
 What do you mean by offset?  If you simply mean the index of 
 the match,
 as your example seems to indicate, you can zip the matches with 
 iota or
 sequence!"n".

 If you want the offset in the string where each match begins I 
 think
 you're out of luck.  I needed something similar a while ago and 
 the best
 I could find was using the cumulative length of the pre 
 property.
Sorry for my confusing example! Yes, I was looking for the offset in the string where the matches begin. I did some programming in PHP in the past. Their preg_match_all function has an optional offset_capture flag. I was hoping for something similar in std.regex... Good tip, I'll use the cumulative length of pre. Thanks you both for your replies!
Jul 08 2014
prev sibling parent Lewis <musicaljelly gmail.com> writes:
On Monday, 7 July 2014 at 21:32:30 UTC, JD wrote:
 I'm using a compile time regex to find some tags in an input
 string. Is it possible to capture the offset of the matches in
 some way? Otherwise I have to "calculate" the offsets myself by
 iterating over the results of matchAll.

 Thanks,
 Jeroen
For anyone coming to this later, I believe the capture strings are all slices. So you can do 'capture[1].ptr - originalString.ptr' to get the index the capture starts at.
Sep 25 2020