www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - find regex in backward direction ?

reply =?UTF-8?B?0JLQuNGC0LDQu9C40Lkg0KTQsNC0?= =?UTF-8?B?0LXQtdCy?= writes:
We have:
     dstring s = "abc3abc7";

Source:
     https://run.dlang.io/is/PtjN4T

Goal:
     size_t pos = findRegexBackward( r"abc"d );
     assert( pos == 4 );


How to find regex in backward direction ?
Dec 19 2020
parent reply kdevel <kdevel vogtner.de> writes:
On Saturday, 19 December 2020 at 12:52:54 UTC, Виталий Фадеев 
wrote:
 Goal:
     size_t pos = findRegexBackward( r"abc"d );
     assert( pos == 4 );
module LastOccurrence; size_t findRegexBackward_1 (dstring s, dstring pattern) { import std.regex : matchAll; auto results = matchAll (s, pattern); if (results.empty) throw new Exception ("could not match"); size_t siz; foreach (rm; results) siz = rm.pre.length; return siz; } size_t findRegexBackward_2 (dstring s, dstring pattern) // this does not work with irreversible patterns ... { import std.regex : matchFirst; import std.array : array; import std.range: retro; auto result = matchFirst (s.retro.array, pattern.retro.array); if (result.empty) throw new Exception ("could not match"); return result.post.length; } unittest { import std.exception : assertThrown; static foreach (f; [&findRegexBackward_1, &findRegexBackward_2]) { assert (f ("abc3abc7", r""d) == 8); assert (f ("abc3abc7", r"abc"d) == 4); assertThrown (f ("abc3abc7", r"abx"d)); assert (f ("abababababab", r"ab"d) == 10); } }
Dec 19 2020
parent reply =?UTF-8?B?0JLQuNGC0LDQu9C40Lkg0KTQsNC0?= =?UTF-8?B?0LXQtdCy?= writes:
On Saturday, 19 December 2020 at 23:16:18 UTC, kdevel wrote:
 On Saturday, 19 December 2020 at 12:52:54 UTC, Виталий Фадеев 
 wrote:
 Goal:
     size_t pos = findRegexBackward( r"abc"d );
     assert( pos == 4 );
module LastOccurrence; size_t findRegexBackward_1 (dstring s, dstring pattern) { import std.regex : matchAll; auto results = matchAll (s, pattern); if (results.empty) throw new Exception ("could not match"); size_t siz; foreach (rm; results) siz = rm.pre.length; return siz; } size_t findRegexBackward_2 (dstring s, dstring pattern) // this does not work with irreversible patterns ... { import std.regex : matchFirst; import std.array : array; import std.range: retro; auto result = matchFirst (s.retro.array, pattern.retro.array); if (result.empty) throw new Exception ("could not match"); return result.post.length; } unittest { import std.exception : assertThrown; static foreach (f; [&findRegexBackward_1, &findRegexBackward_2]) { assert (f ("abc3abc7", r""d) == 8); assert (f ("abc3abc7", r"abc"d) == 4); assertThrown (f ("abc3abc7", r"abx"d)); assert (f ("abababababab", r"ab"d) == 10); } }
Thanks. But, not perfect. We can't use reverse, becausу "ab\w" will be "w\ba" ( expect matching "abc". revesed is "cba" ).
 size_t findRegexBackward_2 (dstring s, dstring pattern)
 ...
    assert (f ("abc3abc7", r"ab\w"d) == 4);
 ...
Of course, I using matchAll. But it scan all text in forward direction.
   size_t findRegexBackward_1 (dstring s, dstring pattern)
/** */ size_t findRegexBackwardMatchCase( dstring s, dstring needle, out size_t matchedLength ) { auto matches = matchAll( s, needle ); if ( matches.empty ) { return -1; } else { auto last = matches.front; foreach ( m; matches ) { last = m; } matchedLength = last.hit.length; return last.pre.length; } } Thank! Fastest solution wanted! May be... some like a "RightToLeft" in Win32 API... https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regexoptions?view=net-5.0#System_Text_RegularExpressions_RegexOptions_RightToLeft but how on Linux? MS-regex and Linux-regex is identical ?
Dec 19 2020
parent =?UTF-8?B?0JLQuNGC0LDQu9C40Lkg0KTQsNC0?= =?UTF-8?B?0LXQtdCy?= writes:
On Sunday, 20 December 2020 at 04:33:21 UTC, Виталий Фадеев wrote:
 On Saturday, 19 December 2020 at 23:16:18 UTC, kdevel wrote:
 On Saturday, 19 December 2020 at 12:52:54 UTC, Виталий Фадеев 
 wrote:
...
"retro" possible when using simple expression "abc". For complex "ab\w" or "(?P<name>regex)" should be parsing: [ "a", "b", "\w" ], [ "(", "?", "P", "<name>", "regex", ")"]..., i think. up.
Dec 19 2020