www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - New regex: Find?

reply dsimcha <dsimcha yahoo.com> writes:
Is there an *efficient* way to simply test whether a given string contains a
given regex in the new std.regex?  Using match() and testing for empty works,
but this apparently triggers a bunch of unnecessary heap allocation.  If not,
is this a universal enough feature to warrant an enhancement request?
May 04 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
dsimcha wrote:
 Is there an *efficient* way to simply test whether a given string contains a
 given regex in the new std.regex?  Using match() and testing for empty works,
 but this apparently triggers a bunch of unnecessary heap allocation.  If not,
 is this a universal enough feature to warrant an enhancement request?
If you only search once, there will be allocation. However, if you search for the same regex several times there will be no extra allocation so the cost will be amortized. Andrei
May 04 2009
parent reply Derek Parnell <derek psych.ward> writes:
On Mon, 04 May 2009 10:09:56 -0500, Andrei Alexandrescu wrote:

 dsimcha wrote:
 Is there an *efficient* way to simply test whether a given string contains a
 given regex in the new std.regex?  Using match() and testing for empty works,
 but this apparently triggers a bunch of unnecessary heap allocation.  If not,
 is this a universal enough feature to warrant an enhancement request?
If you only search once, there will be allocation. However, if you search for the same regex several times there will be no extra allocation so the cost will be amortized.
ranslation: No, there isn't "an *efficient* way". -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
May 04 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Derek Parnell wrote:
 On Mon, 04 May 2009 10:09:56 -0500, Andrei Alexandrescu wrote:
 
 dsimcha wrote:
 Is there an *efficient* way to simply test whether a given string contains a
 given regex in the new std.regex?  Using match() and testing for empty works,
 but this apparently triggers a bunch of unnecessary heap allocation.  If not,
 is this a universal enough feature to warrant an enhancement request?
If you only search once, there will be allocation. However, if you search for the same regex several times there will be no extra allocation so the cost will be amortized.
ranslation: No, there isn't "an *efficient* way".
I think your translation omits important information. I meant exactly what I said: one isolated search can't be currently helped. Repeated searches can. This is because one search triggers the construction of a regex engine, which in turn allocates memory. Andrei
May 04 2009
parent reply Derek Parnell <derek psych.ward> writes:
On Mon, 04 May 2009 16:07:58 -0500, Andrei Alexandrescu wrote:

 Derek Parnell wrote:
 On Mon, 04 May 2009 10:09:56 -0500, Andrei Alexandrescu wrote:
 
 dsimcha wrote:
 Is there an *efficient* way to simply test whether a given string contains a
 given regex in the new std.regex?  Using match() and testing for empty works,
 but this apparently triggers a bunch of unnecessary heap allocation.  If not,
 is this a universal enough feature to warrant an enhancement request?
If you only search once, there will be allocation. However, if you search for the same regex several times there will be no extra allocation so the cost will be amortized.
ranslation: No, there isn't "an *efficient* way".
I think your translation omits important information. I meant exactly what I said: one isolated search can't be currently helped. Repeated searches can. This is because one search triggers the construction of a regex engine, which in turn allocates memory.
I know you meant exactly what you said. I did understand the concept that you were putting forward. However, you didn't actually answer the question. Your answer sounds as if it came from a politian. Maybe a compromise then (more polly talk) ... It is not efficient if you are doing one (or a few) finds, however when doing many finds using the same regex it becomes more and more efficient the more you use it. -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
May 04 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Derek Parnell wrote:
 On Mon, 04 May 2009 16:07:58 -0500, Andrei Alexandrescu wrote:
 
 Derek Parnell wrote:
 On Mon, 04 May 2009 10:09:56 -0500, Andrei Alexandrescu wrote:

 dsimcha wrote:
 Is there an *efficient* way to simply test whether a given string contains a
 given regex in the new std.regex?  Using match() and testing for empty works,
 but this apparently triggers a bunch of unnecessary heap allocation.  If not,
 is this a universal enough feature to warrant an enhancement request?
If you only search once, there will be allocation. However, if you search for the same regex several times there will be no extra allocation so the cost will be amortized.
ranslation: No, there isn't "an *efficient* way".
I think your translation omits important information. I meant exactly what I said: one isolated search can't be currently helped. Repeated searches can. This is because one search triggers the construction of a regex engine, which in turn allocates memory.
I know you meant exactly what you said. I did understand the concept that you were putting forward. However, you didn't actually answer the question. Your answer sounds as if it came from a politian.
I emphatically think not, as my answer was precise and did not try to hide anything. Oh, whatever. Andrei
May 04 2009
parent Derek Parnell <derek psych.ward> writes:
On Mon, 04 May 2009 16:45:35 -0500, Andrei Alexandrescu wrote:

 Your answer sounds as if it came from a politian. 
 I emphatically think not, as my answer was precise and did not try to 
 hide anything. Oh, whatever.
I apologize without reservation. I'm the type of person that thinks that if an answer can be stated as yes or no, but with some qualifiations, then the answer ought to be given as yes or no followed by the "however" part. But that's just me, going by the response I get from most everyone I talk to. :-) -- Derek Parnell Melbourne, Australia skype: derek.j.parnell
May 04 2009
prev sibling next sibling parent Georg Wrede <georg.wrede iki.fi> writes:
dsimcha wrote:
 Is there an *efficient* way to simply test whether a given string contains a
 given regex in the new std.regex?  Using match() and testing for empty works,
 but this apparently triggers a bunch of unnecessary heap allocation.  If not,
 is this a universal enough feature to warrant an enhancement request?
Your question can be understood in several ways. Others have answered one of these. OTOH, if you just have two strings, out of which the shorter one is a regex, and you want to know whether it is included in the other string, then I suggest using std.string.find which doesn't seem to cause heap allocation, and is very fast.
May 04 2009
prev sibling parent reply "Joel C. Salomon" <joelcsalomon gmail.com> writes:
dsimcha wrote:
 Is there an *efficient* way to simply test whether a given string contains a
 given regex in the new std.regex?  Using match() and testing for empty works,
 but this apparently triggers a bunch of unnecessary heap allocation.  If not,
 is this a universal enough feature to warrant an enhancement request?
You mean to search for a regex match without constructing the regex engine? Good luck with that. —Joel Salomon
May 04 2009
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Joel C. Salomon (joelcsalomon gmail.com)'s article
 dsimcha wrote:
 Is there an *efficient* way to simply test whether a given string contains a
 given regex in the new std.regex?  Using match() and testing for empty works,
 but this apparently triggers a bunch of unnecessary heap allocation.  If not,
 is this a universal enough feature to warrant an enhancement request?
You mean to search for a regex match without constructing the regex engine? Good luck with that. —Joel Salomon
Actually, the behavior Andrei describes is what I wanted: One allocation to construct the regex engine, amortized. However, contrary to what Andrei said, match() apparently allocates additional memory on each call. The following program leaks memory like a sieve when the GC is disabled: import std.regex, core.memory; void main() { string s = "This is only a test. Repeat, this is only a test."; auto r = regex("is.only"); GC.disable; while(true) { auto m = match(s, r); } }
May 04 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
dsimcha wrote:
 == Quote from Joel C. Salomon (joelcsalomon gmail.com)'s article
 dsimcha wrote:
 Is there an *efficient* way to simply test whether a given string contains a
 given regex in the new std.regex?  Using match() and testing for empty works,
 but this apparently triggers a bunch of unnecessary heap allocation.  If not,
 is this a universal enough feature to warrant an enhancement request?
You mean to search for a regex match without constructing the regex engine? Good luck with that. —Joel Salomon
Actually, the behavior Andrei describes is what I wanted: One allocation to construct the regex engine, amortized. However, contrary to what Andrei said, match() apparently allocates additional memory on each call. The following program leaks memory like a sieve when the GC is disabled: import std.regex, core.memory; void main() { string s = "This is only a test. Repeat, this is only a test."; auto r = regex("is.only"); GC.disable; while(true) { auto m = match(s, r); } }
Ah... Sigh, I meant to implement the short string optimization in the regex range, and put it off forever. It was about time it would come back to haunt me :o). Could you please submit an enhancement request to Bugzilla? Maybe with a patch? :o) Andrei
May 04 2009